Abstract
A microscopic particle obeys the principles of quantum mechanics -- so where is the sharp boundary between the macroscopic and microscopic worlds? It was this "interpretation problem" that prompted Schr\"odinger to propose his famous thought experiment (a cat that is simultaneously both dead and alive) and sparked a great debate about the quantum measurement problem, and there is still no satisfactory answer yet. This is precisely the inadequacy of rigorous mathematical models in describing the laws of nature. We propose a computational model to describe and understand the laws of nature based on Darwin's natural selection. In fact, whether it's a macro particle, a micro electron or a security, they can all be considered as an entity, the change of this entity over time can be described by a data series composed of states and values. An observer can learn from this data series to construct theories (usually consisting of functions and differential equations). We don't model with the usual functions or differential equations, but with a state Decision Tree (determines the state of an entity) and a value Function Tree (determines the distance between two points of an entity). A state Decision Tree and a value Function Tree together can reconstruct an entity's trajectory and make predictions about its future trajectory. Our proposed algorithmic model discovers laws of nature by only learning observed historical data (sequential measurement of observables) based on maximizing the observer's expected value. There is no differential equation in our model; our model has an emphasis on machine learning, where the observer builds up his/her experience by being rewarded or punished for each decision he/she makes, and eventually leads to rediscovering Newton's law, the Born rule (quantum mechanics) and the efficient market hypothesis (financial market).
Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition
Abstract
Recently, wearable emotion recognition based on peripheral physiological signals has drawn massive attention due to its less invasive nature and its applicability in real-life scenarios. However, how to effectively fuse multimodal data remains a challenging problem. Moreover, traditional fully-supervised based approaches suffer from overfitting given limited labeled data. To address the above issues, we propose a novel self-supervised learning (SSL) framework for wearable emotion recognition, where efficient multimodal fusion is realized with temporal convolution-based modality-specific encoders and a transformer-based shared encoder, capturing both intra-modal and inter-modal correlations. Extensive unlabeled data is automatically assigned labels by five signal transforms, and the proposed SSL model is pre-trained with signal transformation recognition as a pretext task, allowing the extraction of generalized multimodal representations for emotion-related downstream tasks. For evaluation, the proposed SSL model was first pre-trained on a large-scale self-collected physiological dataset and the resulting encoder was subsequently frozen or fine-tuned on three public supervised emotion recognition datasets. Ultimately, our SSL-based method achieved state-of-the-art results in various emotion classification tasks. Meanwhile, the proposed model proved to be more accurate and robust compared to fully-supervised methods on low data regimes.
Scalable High-Quality Hypergraph Partitioning
Authors: Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across $k$ different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening - which are used in a multilevel scheme with $\log(n)$ levels. As additional components, we parallelize the $n$-level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x ($\log(n)$-level), and 25.9x ($n$-level).
Mitigating Source Bias for Fairer Weak Supervision
Authors: Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
Abstract
Weak supervision overcomes the label bottleneck, enabling efficient development of training sets. Millions of models trained on such datasets have been deployed in the real world and interact with users on a daily basis. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also ensure that the pseudolabels it produces are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. This work begins such a study. Our departure point is the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. Fortunately, not all is lost: we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness metrics -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.
BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware
Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
Abstract
Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.
MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory
Authors: Samuel Riedel, Matheus Cavalcante, Renzo Andri, Luca Benini
Abstract
Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly-coupled clusters would not scale beyond a few tens of PEs. In this work, we tackle scaling shared L1 clusters to hundreds of PEs while supporting a flexible and productive programming model and maintaining high efficiency. We present MemPool, a manycore system with 256 RV32IMAXpulpimg "Snitch" cores featuring application-tunable functional units. We designed and implemented an efficient low-latency PE to L1-memory interconnect, an optimized instruction path to ensure each PE's independent execution, and a powerful DMA engine and system interconnect to stream data in and out. MemPool is easy to program, with all the cores sharing a global view of a large, multi-banked, L1 scratchpad memory, accessible within at most five cycles in the absence of conflicts. We provide multiple runtimes to program MemPool at different abstraction levels and illustrate its versatility with a wide set of applications. MemPool runs at 600 MHz (60 gate delays) in typical conditions (TT/0.80V/25{\deg}C) in 22 nm FDX technology and achieves a performance of up to 229 GOPS or 192 GOPS/W with less than 2% of execution stalls.
Solution of Real Cubic Equations without Cardano's Formula
Abstract
Building on a classification of zeros of cubic equations due to the $12$-th century Persian mathematician Sharaf al-Din Tusi, together with Smale's theory of {\it point estimation}, we derive an efficient recipe for computing high-precision approximation to a real root of an arbitrary real cubic equation. First, via reversible transformations we reduce any real cubic equation into one of four canonical forms with $0$, $\pm 1$ coefficients, except for the constant term as $\pm q$, $q \geq 0$. Next, given any form, if $\rho_q$ is an approximation to $\sqrt[3]{q}$ to within a relative error of five percent, we prove a {\it seed} $x_0$ in ${ \rho_q, \pm .95 \rho_q, -\frac{1}{3}, 1 }$ can be selected such that in $t$ Newton iterations $|x_t - \theta_q| \leq \sqrt[3]{q}\cdot 2^{-2^{t}}$ for some real root $\theta_q$. While computing a good seed, even for approximation of $\sqrt[3]{q}$, is considered to be ``somewhat of black art'' (see Wikipedia), as we justify, $\rho_q$ is readily computable from {\it mantissa} and {\it exponent} of $q$. It follows that the above approach gives a simple recipe for numerical approximation of solutions of real cubic equations independent of Cardano's formula.
MLGCN: An Ultra Efficient Graph Convolution Neural Model For 3D Point Cloud Analysis
Authors: Mohammad Khodadad, Morteza Rezanejad, Ali Shiraee Kasmaee, Kaleem Siddiqi, Dirk Walther, Hamidreza Mahyar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The analysis of 3D point clouds has diverse applications in robotics, vision and graphics. Processing them presents specific challenges since they are naturally sparse, can vary in spatial resolution and are typically unordered. Graph-based networks to abstract features have emerged as a promising alternative to convolutional neural networks for their analysis, but these can be computationally heavy as well as memory inefficient. To address these limitations we introduce a novel Multi-level Graph Convolution Neural (MLGCN) model, which uses Graph Neural Networks (GNN) blocks to extract features from 3D point clouds at specific locality levels. Our approach employs precomputed graph KNNs, where each KNN graph is shared between GCN blocks inside a GNN block, making it both efficient and effective compared to present models. We demonstrate the efficacy of our approach on point cloud based object classification and part segmentation tasks on benchmark datasets, showing that it produces comparable results to those of state-of-the-art models while requiring up to a thousand times fewer floating-point operations (FLOPs) and having significantly reduced storage requirements. Thus, our MLGCN model could be particular relevant to point cloud based 3D shape analysis in industrial applications when computing resources are scarce.
Pacti: Scaling Assume-Guarantee Reasoning for System Analysis and Design
Authors: Inigo Incer, Apurva Badithela, Josefine Graebener, Piergiuseppe Mallozzi, Ayush Pandey, Sheng-Jung Yu, Albert Benveniste, Benoit Caillaud, Richard M. Murray, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia
Subjects: Logic in Computer Science (cs.LO); Systems and Control (eess.SY)
Abstract
Contract-based design is a method to facilitate modular system design. While there has been substantial progress on the theory of contracts, there has been less progress on scalable algorithms for the algebraic operations in this theory. In this paper, we present: 1) principles to implement a contract-based design tool at scale and 2) Pacti, a tool that can efficiently compute these operations. We then illustrate the use of Pacti in a variety of case studies.
Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions
Authors: Vesa Kaarnioja, Frances Y. Kuo, Ian H. Sloan
Abstract
We describe a fast method for solving elliptic partial differential equations (PDEs) with uncertain coefficients using kernel interpolation at a lattice point set. By representing the input random field of the system using the model proposed by Kaarnioja, Kuo, and Sloan (SIAM J.~Numer.~Anal.~2020), in which a countable number of independent random variables enter the random field as periodic functions, it was shown by Kaarnioja, Kazashi, Kuo, Nobile, and Sloan (Numer.~Math.~2022) that the lattice-based kernel interpolant can be constructed for the PDE solution as a function of the stochastic variables in a highly efficient manner using fast Fourier transform (FFT). In this work, we discuss the connection between our model and the popular ``affine and uniform model'' studied widely in the literature of uncertainty quantification for PDEs with uncertain coefficients. We also propose a new class of weights entering the construction of the kernel interpolant -- \emph{serendipitous weights} -- which dramatically improve the computational performance of the kernel interpolant for PDE problems with uncertain coefficients, and allow us to tackle function approximation problems up to very high dimensionalities. Numerical experiments are presented to showcase the performance of the serendipitous weights.
SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer
Authors: Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Real-world images taken by different cameras with different degradation kernels often result in a cross-device domain gap in image super-resolution. A prevalent attempt to this issue is unsupervised domain adaptation (UDA) that needs to access source data. Considering privacy policies or transmission restrictions of data in many practical applications, we propose a SOurce-free image Super-Resolution framework (SOSR) to address this issue, i.e., adapt a model pre-trained on labeled source data to a target domain with only unlabeled target data. SOSR leverages the source model to generate refined pseudo-labels for teacher-student learning. To better utilize the pseudo-labels, this paper proposes a novel wavelet-based augmentation method, named Wavelet Augmentation Transformer (WAT), which can be flexibly incorporated with existing networks, to implicitly produce useful augmented data. WAT learns low-frequency information of varying levels across diverse samples, which is aggregated efficiently via deformable attention. Furthermore, an uncertainty-aware self-training mechanism is proposed to improve the accuracy of pseudo-labels, with inaccurate predictions being rectified by uncertainty estimation. To acquire better SR results and avoid overfitting pseudo-labels, several regularization losses are proposed to constrain the frequency information between target LR and SR images. Experiments show that without accessing source data, SOSR achieves superior results to the state-of-the-art UDA methods.
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text
Authors: Yasmen Wahba, Nazim Madhavji, John Steinbacher
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial to avoid errors propagating to the lower levels. In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal. A current trend in the Natural Language Processing (NLP) community is towards employing huge pre-trained language models (PLMs) or what is known as self-attention models (e.g., BERT) for almost any kind of NLP task (e.g., question-answering, sentiment analysis, text classification). Despite the widespread use of PLMs and the impressive performance in a broad range of NLP tasks, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification (TC) tasks, given the monosemic nature of specialized words (i.e., jargon) found in domain-specific text which renders the purpose of contextualized embeddings (e.g., PLMs) futile. In this paper, we compare the accuracies of some state-of-the-art (SOTA) models reported in the literature against a Linear SVM classifier and TFIDF vectorization model on three TC datasets. Results show a comparable performance for the LinearSVM. The findings of this study show that for domain-specific TC tasks, a linear model can provide a comparable, cheap, reproducible, and interpretable alternative to attention-based models.
Neural Microfacet Fields for Inverse Rendering
Authors: Alexander Mai, Dor Verbin, Falko Kuester, Sara Fridovich-Keil
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
We present Neural Microfacet Fields, a method for recovering materials, geometry, and environment illumination from images of a scene. Our method uses a microfacet reflectance model within a volumetric setting by treating each sample along the ray as a (potentially non-opaque) surface. Using surface-based Monte Carlo rendering in a volumetric setting enables our method to perform inverse rendering efficiently by combining decades of research in surface-based light transport with recent advances in volume rendering for view synthesis. Our approach outperforms prior work in inverse rendering, capturing high fidelity geometry and high frequency illumination details; its novel view synthesis results are on par with state-of-the-art methods that do not recover illumination or materials.
LabelVizier: Interactive Validation and Relabeling for Technical Text Annotations
Abstract
With the rapid accumulation of text data produced by data-driven techniques, the task of extracting "data annotations"--concise, high-quality data summaries from unstructured raw text--has become increasingly important. The recent advances in weak supervision and crowd-sourcing techniques provide promising solutions to efficiently create annotations (labels) for large-scale technical text data. However, such annotations may fail in practice because of the change in annotation requirements, application scenarios, and modeling goals, where label validation and relabeling by domain experts are required. To approach this issue, we present LabelVizier, a human-in-the-loop workflow that incorporates domain knowledge and user-specific requirements to reveal actionable insights into annotation flaws, then produce better-quality labels for large-scale multi-label datasets. We implement our workflow as an interactive notebook to facilitate flexible error profiling, in-depth annotation validation for three error types, and efficient annotation relabeling on different data scales. We evaluated the efficiency and generalizability of our workflow with two use cases and four expert reviews. The results indicate that LabelVizier is applicable in various application scenarios and assist domain experts with different knowledge backgrounds to efficiently improve technical text annotation quality.
AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges
Authors: Bin Yang, Shanyun Liu, Tao Xu, Chuyu Li, Yongdong Zhu, Zipeng Li, Zhifeng Zhao
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Space-air-ground integrated networks (SAGINs), which have emerged as an expansion of terrestrial networks, provide flexible access, ubiquitous coverage, high-capacity backhaul, and emergency/disaster recovery for mobile users (MUs). While the massive benefits brought by SAGIN may improve the quality of service, unauthorized access to SAGIN entities is potentially dangerous. At present, conventional crypto-based authentication is facing challenges, such as the inability to provide continuous and transparent protection for MUs. In this article, we propose an AI-oriented two-phase multi-factor authentication scheme (ATMAS) by introducing intelligence to authentication. The satellite and network control center collaborate on continuous authentication, while unique spatial-temporal features, including service features and geographic features, are utilized to enhance the system security. Our further security analysis and performance evaluations show that ATMAS has proper security characteristics which can meet various security requirements. Moreover, we shed light on lightweight and efficient authentication mechanism design through a proper combination of spatial-temporal factors.
Vision-Assisted mmWave Beam Management for Next-Generation Wireless Systems: Concepts, Solutions and Open Challenges
Authors: Kan Zheng, Haojun Yang, Ziqiang Ying, Pengshuo Wang, Lajos Hanzo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Beamforming techniques have been widely used in the millimeter wave (mmWave) bands to mitigate the path loss of mmWave radio links as the narrow straight beams by directionally concentrating the signal energy. However, traditional mmWave beam management algorithms usually require excessive channel state information overhead, leading to extremely high computational and communication costs. This hinders the widespread deployment of mmWave communications. By contrast, the revolutionary vision-assisted beam management system concept employed at base stations (BSs) can select the optimal beam for the target user equipment (UE) based on its location information determined by machine learning (ML) algorithms applied to visual data, without requiring channel information. In this paper, we present a comprehensive framework for a vision-assisted mmWave beam management system, its typical deployment scenarios as well as the specifics of the framework. Then, some of the challenges faced by this system and their efficient solutions are discussed from the perspective of ML. Next, a new simulation platform is conceived to provide both visual and wireless data for model validation and performance evaluation. Our simulation results indicate that the vision-assisted beam management is indeed attractive for next-generation wireless systems.
Exploring the Limits of Deep Image Clustering using Pretrained Models
Authors: Nikolas Adaloglou, Felix Michels, Hamza Kalisch, Markus Kollmann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We present a general methodology that learns to classify images without labels by leveraging pretrained feature extractors. Our approach involves self-distillation training of clustering heads, based on the fact that nearest neighbors in the pretrained feature space are likely to share the same label. We propose a novel objective to learn associations between images by introducing a variant of pointwise mutual information together with instance weighting. We demonstrate that the proposed objective is able to attenuate the effect of false positive pairs while efficiently exploiting the structure in the pretrained feature space. As a result, we improve the clustering accuracy over $k$-means on $17$ different pretrained models by $6.1$\% and $12.2$\% on ImageNet and CIFAR100, respectively. Finally, using self-supervised pretrained vision transformers we push the clustering accuracy on ImageNet to $61.6$\%. The code will be open-sourced.
FP8 versus INT8 for efficient deep learning inference
Authors: Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort
Abstract
Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.
How to measure research performance of single scientists? A proposal for an index based on scientific prizes: The Prize Winner Index (PWI)
Authors: Lutz Bornmann, Robin Haunschild
Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
Abstract
In this study, we propose a new index for measuring excellence in science which is based on collaborations (co-authorship distances) in science. The index is based on the Erd\H{o}s number - a number that was introduced several years ago. We propose to focus with the new index on laureates of prestigious prizes in a certain field and to measure co-authorship distances between the laureates and other scientists. To exemplify and explain our proposal, we computed the proposed index in the field of quantitative science studies (PWIPM). The Derek de Solla Price Memorial Award (Price Medal, PM) is awarded to outstanding scientists in the field. We tested the convergent validity of the PWIPM. We were interested whether the indicator is related to an established bibliometric indicator: P(top 10%). The results show that the coefficients for the correlation between PWIPM and P(top 10%) are high (in cases when a sufficient number of papers have been considered for a reliable assessment of performance). Therefore, measured by an established indicator for research excellence, the new PWI indicator seems to be convergently valid and, therefore, might be a possible alternative for established (bibliometric) indicators - with a focus on prizes.
The Many Qualities of a New Directly Accessible Compression Scheme
Abstract
We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression ratios often higher than those offered by other solutions in the literature. The SFDC scheme provides a flexible and simple representation geared towards either practical efficiency or compression ratios, as required. For a text of length $n$ over an alphabet of size $\sigma$ and a fixed parameter $\lambda$, the access time of the proposed encoding is proportional to the length of the character's code-word, plus an expected $\mathcal{O}((F{\sigma - \lambda + 3} - 3)/F{\sigma+1})$ overhead, where $Fj$ is the $j$-th number of the Fibonacci sequence. In the overall it uses $N+\mathcal{O}\big(n \left(\lambda - (F{\sigma+3}-3)/F_{\sigma+1}\big) \right) = N + \mathcal{O}(n)$ bits, where $N$ is the length of the encoded string. Experimental results show that the performance of our scheme is, in some respects, comparable with the performance of DACs and Wavelet Tees, which are among of the most efficient schemes. In addition our scheme is configured as a \emph{computation-friendly compression} scheme, as it counts several features that make it very effective in text processing tasks. In the string matching problem, that we take as a case study, we experimentally prove that the new scheme enables results that are up to 29 times faster than standard string-matching techniques on plain texts.
A data-driven method for parametric PDE Eigenvalue Problems using Gaussian Process with different covariance functions
Authors: Moataz Alghamdi, Fleurianne Bertrand, Daniele Boffi, Abdul Halim
Abstract
We propose a non-intrusive, reduced-basis, and data-driven method for approximating both eigenvalues and eigenvectors in parametric eigenvalue problems. We generate the basis of the reduced space by applying the proper orthogonal decomposition (POD) approach on a collection of pre-computed, full-order snapshots at a chosen set of parameters. Then, we use Bayesian linear regression (a.k.a. Gaussian Process Regression) in the online phase to predict both eigenvalues and eigenvectors at new parameters. A split of the data generated in the offline phase into training and test data sets is utilized in the numerical experiments following standard practices in the field of supervised machine learning. Furthermore, we discuss the connection between Gaussian Process Regression and spline methods, and compare the performance of GPR method against linear and cubic spline methods. We show that GPR outperforms other methods for functions with a certain regularity. To this end, we discuss various different covariance functions which influence the performance of GPR. The proposed method is shown to be accurate and efficient for the approximation of multiple 1D and 2D affine and non-affine parameter-dependent eigenvalue problems that exhibit crossing of eigenvalues.
Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems
Authors: Robin Herkert, Patrick Buchfink, Bernard Haasdonk
Abstract
Classical model order reduction (MOR) for parametric problems may become computationally inefficient due to large sizes of the required projection bases, especially for problems with slowly decaying Kolmogorov n-widths. Additionally, Hamiltonian structure of dynamical systems may be available and should be preserved during the reduction. In the current presentation, we address these two aspects by proposing a corresponding dictionary-based, online-adaptive MOR approach. The method requires dictionaries for the state-variable, non-linearities and discrete empirical interpolation (DEIM) points. During the online simulation, local basis extensions/simplifications are performed in an online-efficient way, i.e. the runtime complexity of basis modifications and online simulation of the reduced models do not depend on the full state dimension. Experiments on a linear wave equation and a non-linear Sine-Gordon example demonstrate the efficiency of the approach.
Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks
Authors: Abdoulaye Koroko, Ani Anciaux-Sedrakian, Ibtihel Ben Gharbia, Valérie Garès, Mounir Haddou, Quang Huy Tran
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.
RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving
Authors: Chenghao Shi, Xieyuanli Chen, Huimin Lu, Wenbang Deng, Junhao Xiao, Bin Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Point cloud registration is an important task in robotics and autonomous driving to estimate the ego-motion of the vehicle. Recent advances following the coarse-to-fine manner show promising potential in point cloud registration. However, existing methods rely on good superpoint correspondences, which are hard to be obtained reliably and efficiently, thus resulting in less robust and accurate point cloud registration. In this paper, we propose a novel network, named RDMNet, to find dense point correspondences coarse-to-fine and improve final pose estimation based on such reliable correspondences. Our RDMNet uses a devised 3D-RoFormer mechanism to first extract distinctive superpoints and generates reliable superpoints matches between two point clouds. The proposed 3D-RoFormer fuses 3D position information into the transformer network, efficiently exploiting point clouds' contextual and geometric information to generate robust superpoint correspondences. RDMNet then propagates the sparse superpoints matches to dense point matches using the neighborhood information for accurate point cloud registration. We extensively evaluate our method on multiple datasets from different environments. The experimental results demonstrate that our method outperforms existing state-of-the-art approaches in all tested datasets with a strong generalization ability.
Direct Data-Driven Computation of Polytopic Robust Control Invariant Sets and State-Feedback Controllers
Abstract
This paper presents a direct data-driven approach for computing robust control invariant (RCI) sets and their associated state-feedback control laws. The proposed method utilizes a single state-input trajectory generated from the system, to compute a polytopic RCI set with a desired complexity and an invariance-inducing feedback controller, without the need to identify a model of the system. The problem is formulated in terms of a set of sufficient LMI conditions that are then combined in a semi-definite program to maximize the volume of the RCI set while respecting the state and input constraints. We demonstrate through a numerical case study that the proposed data-driven approach can generate RCI sets that are of comparable size to those obtained by a model-based method in which exact knowledge of the system matrices is assumed. Under the assumption of persistency of excitation of the data, the proposed algorithm guarantees robust invariance even with a small number of data samples. Overall, the direct data-driven approach presented in this paper offers a reliable and efficient counterpart to the model-based methods for RCI set computation and state-feedback controller design.
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural-network-based single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is ill-posed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing state-of-the-art learning techniques predict a single scalar depth value per-pixel. Yet, it is well-known that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. To this end, we model per-pixel scene depth using a multivariate Gaussian distribution. Moreover, contrary to the existing uncertainty modeling methods -- in the same spirit, where per-pixel depth is assumed to be independent, we introduce per-pixel covariance modeling that encodes its depth dependency w.r.t all the scene points. Unfortunately, per-pixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned low-rank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows state-of-the-art results. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
How Efficient Are Today's Continual Learning Algorithms?
Authors: Md Yousuf Harun, Jhair Gallardo, Tyler L. Hayes, Christopher Kanan
Abstract
Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.
A Closer Look at Parameter-Efficient Tuning in Diffusion Models
Authors: Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 \% extra parameters, across various customized tasks.
GVP: Generative Volumetric Primitives
Authors: Mallikarjun B R, Xingang Pan, Mohamed Elgharib, Christian Theobalt
Abstract
Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
Aerostack2: A Software Framework for Developing Multi-robot Aerial Systems
Authors: Miguel Fernandez-Cortizas, Martin Molina, Pedro Arias-Perez, Rafael Perez-Segui, David Perez-Saura, Pascual Campoy
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
In recent years, the robotics community has witnessed the development of several software stacks for ground and articulated robots, such as Navigation2 and MoveIt. However, the same level of collaboration and standardization is yet to be achieved in the field of aerial robotics, where each research group has developed their own frameworks. This work presents Aerostack2, a framework for the development of autonomous aerial robotics systems that aims to address the lack of standardization and fragmentation of efforts in the field. Built on ROS 2 middleware and featuring an efficient modular software architecture and multi-robot orientation, Aerostack2 is a versatile and platform-independent environment that covers a wide range of robot capabilities for autonomous operation. Its major contributions include providing a logical level for specifying missions, reusing components and sub-systems for aerial robotics, and enabling the development of complete control architectures. All major contributions have been tested in simulation and real flights with multiple heterogeneous swarms. Aerostack2 is open source and community oriented, democratizing the access to its technology by autonomous drone systems developers.
Keyword: faster
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Authors: Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.
Scalable High-Quality Hypergraph Partitioning
Authors: Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across $k$ different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening - which are used in a multilevel scheme with $\log(n)$ levels. As additional components, we parallelize the $n$-level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x ($\log(n)$-level), and 25.9x ($n$-level).
Task Oriented Conversational Modelling With Subjective Knowledge
Authors: Raja Kumar
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 \% improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS
BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware
Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels, David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger, Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
Abstract
Efficient large-scale neural network training and inference on commodity CPU hardware is of immense practical significance in democratizing deep learning (DL) capabilities. Presently, the process of training massive models consisting of hundreds of millions to billions of parameters requires the extensive use of specialized hardware accelerators, such as GPUs, which are only accessible to a limited number of institutions with considerable financial resources. Moreover, there is often an alarming carbon footprint associated with training and deploying these models. In this paper, we address these challenges by introducing BOLT, a sparse deep learning library for training massive neural network models on standard CPU hardware. BOLT provides a flexible, high-level API for constructing models that will be familiar to users of existing popular DL frameworks. By automatically tuning specialized hyperparameters, BOLT also abstracts away the algorithmic details of sparse network training. We evaluate BOLT on a number of machine learning tasks drawn from recommendations, search, natural language processing, and personalization. We find that our proposed system achieves competitive performance with state-of-the-art techniques at a fraction of the cost and energy consumption and an order-of-magnitude faster inference time. BOLT has also been successfully deployed by multiple businesses to address critical problems, and we highlight one customer deployment case study in the field of e-commerce.
Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Authors: Yanjie Dong, Luya Wang, Yuanfang Chi, Jia Wang, Haijun Zhang, Fei Richard Yu, Victor C. M. Leung, Xiping Hu
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Abstract
A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.
IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection
Authors: Hu Haotian, Wang Fanyi, Su Jingwen, Gao Shiyu, Zhang Zhiwang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D object detection is one of the most important tasks in autonomous driving and robotics. Our research focuses on tackling low efficiency issue of point-based methods on large-scale point clouds. Existing point-based methods adopt farthest point sampling (FPS) strategy for downsampling, which is computationally expensive in terms of inference time and memory consumption when the number of point cloud increases. In order to improve efficiency, we propose a novel Instance-Centroid Faster Point Sampling Module (IC-FPS) , which effectively replaces the first Set Abstraction (SA) layer that is extremely tedious. IC-FPS module is comprised of two methods, local feature diffusion based background point filter (LFDBF) and Centroid-Instance Sampling Strategy (CISS). LFDBF is constructed to exclude most invalid background points, while CISS substitutes FPS strategy by fast sampling centroids and instance points. IC-FPS module can be inserted to almost every point-based models. Extensive experiments on multiple public benchmarks have demonstrated the superiority of IC-FPS. On Waymo dataset, the proposed module significantly improves performance of baseline model and accelerates inference speed by 3.8 times. For the first time, real-time detection of point-based models in large-scale point cloud scenario is realized.
The Many Qualities of a New Directly Accessible Compression Scheme
Abstract
We present a new variable-length computation-friendly encoding scheme, named SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast accessibility to any element of the compressed sequence and achieves compression ratios often higher than those offered by other solutions in the literature. The SFDC scheme provides a flexible and simple representation geared towards either practical efficiency or compression ratios, as required. For a text of length $n$ over an alphabet of size $\sigma$ and a fixed parameter $\lambda$, the access time of the proposed encoding is proportional to the length of the character's code-word, plus an expected $\mathcal{O}((F{\sigma - \lambda + 3} - 3)/F{\sigma+1})$ overhead, where $Fj$ is the $j$-th number of the Fibonacci sequence. In the overall it uses $N+\mathcal{O}\big(n \left(\lambda - (F{\sigma+3}-3)/F_{\sigma+1}\big) \right) = N + \mathcal{O}(n)$ bits, where $N$ is the length of the encoded string. Experimental results show that the performance of our scheme is, in some respects, comparable with the performance of DACs and Wavelet Tees, which are among of the most efficient schemes. In addition our scheme is configured as a \emph{computation-friendly compression} scheme, as it counts several features that make it very effective in text processing tasks. In the string matching problem, that we take as a case study, we experimentally prove that the new scheme enables results that are up to 29 times faster than standard string-matching techniques on plain texts.
TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features
Abstract
Recently, with the rapid deployment of service APIs, personalized service recommendations have played a paramount role in the growth of the e-commerce industry. Quality-of-Service (QoS) parameters determining the service performance, often used for recommendation, fluctuate over time. Thus, the QoS prediction is essential to identify a suitable service among functionally equivalent services over time. The contemporary temporal QoS prediction methods hardly achieved the desired accuracy due to various limitations, such as the inability to handle data sparsity and outliers and capture higher-order temporal relationships among user-service interactions. Even though some recent recurrent neural-network-based architectures can model temporal relationships among QoS data, prediction accuracy degrades due to the absence of other features (e.g., collaborative features) to comprehend the relationship among the user-service interactions. This paper addresses the above challenges and proposes a scalable strategy for Temporal QoS Prediction using Multi-source Collaborative-Features (TPMCF), achieving high prediction accuracy and faster responsiveness. TPMCF combines the collaborative-features of users/services by exploiting user-service relationship with the spatio-temporal auto-extracted features by employing graph convolution and transformer encoder with multi-head self-attention. We validated our proposed method on WS-DREAM-2 datasets. Extensive experiments showed TPMCF outperformed major state-of-the-art approaches regarding prediction accuracy while ensuring high scalability and reasonably faster responsiveness.
Shipper collaboration matching: fast enumeration of triangular transports with high cooperation effects
Abstract
The logistics industry in Japan is facing a severe shortage of labor. Therefore, there is an increasing need for joint transportation allowing large amounts of cargo to be transported using fewer trucks. In recent years, the use of artificial intelligence and other new technologies has gained wide attention for improving matching efficiency. However, it is difficult to develop a system that can instantly respond to requests because browsing through enormous combinations of two transport lanes is time consuming. In this study, we focus on a form of joint transportation called triangular transportation and enumerate the combinations with high cooperation effects. The proposed algorithm makes good use of hidden inequalities, such as the distance axiom, to narrow down the search range without sacrificing accuracy. Numerical experiments show that the proposed algorithm is thousands of times faster than simple brute force. With this technology as the core engine, we developed a joint transportation matching system. The system has already been in use by over 150 companies as of October 2022, and was featured in a collection of logistics digital transformation cases published by Japan's Ministry of Land, Infrastructure, Transport and Tourism.
Keyword: mobile
A CI-based Auditing Framework for Data Collection Practices
Authors: Athina Markopoulou, Rahmadi Trimananda, Hao Cui
Abstract
Apps and devices (mobile devices, web browsers, IoT, VR, voice assistants, etc.) routinely collect user data, and send them to first- and third-party servers through the network. Recently, there is a lot of interest in (1) auditing the actual data collection practices of those systems; and also in (2) checking the consistency of those practices against the statements made in the corresponding privacy policies. In this paper, we argue that the contextual integrity (CI) tuple can be the basic building block for defining and implementing such an auditing framework. We elaborate on the special case where the tuple is partially extracted from the network traffic generated by the end-device of interest, and partially from the corresponding privacy policies using natural language processing (NLP) techniques. Along the way, we discuss related bodies of work and representative examples that fit into that framework. More generally, we believe that CI can be the building block not only for auditing at the edge, but also for specifying privacy policies and system APIs. We also discuss limitations and directions for future work.
Abstract
Given a 3D object, kinematic motion prediction aims to identify the mobile parts as well as the corresponding motion parameters. Due to the large variations in both topological structure and geometric details of 3D objects, this remains a challenging task and the lack of large scale labeled data also constrain the performance of deep learning based approaches. In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. Our key observations are two-fold. First, although 3D dataset with fully annotated motion labels is limited, there are existing datasets and methods for object part semantic segmentation at large scale. Second, semantic part segmentation and mobile part segmentation is not always consistent but it is possible to detect the mobile parts from the underlying 3D structure. Towards this end, we propose a graph neural network to learn the map between hierarchical part-level segmentation and mobile parts parameters, which are further refined based on geometric alignment. This network can be first trained on PartNet-Mobility dataset with fully labeled mobility information and then applied on PartNet dataset with fine-grained and hierarchical part-level segmentation. The network predictions yield a large scale of 3D objects with pseudo labeled mobility information and can further be used for weakly-supervised learning with pre-existing segmentation. Our experiments show there are significant performance boosts with the augmented data for previous method designed for kinematic motion prediction on 3D partial scans.
Rethinking Local Perception in Lightweight Vision Transformer
Authors: Qihang Fan, Huaibo Huang, Jiyang Guan, Ran He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.
AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges
Authors: Bin Yang, Shanyun Liu, Tao Xu, Chuyu Li, Yongdong Zhu, Zipeng Li, Zhifeng Zhao
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Space-air-ground integrated networks (SAGINs), which have emerged as an expansion of terrestrial networks, provide flexible access, ubiquitous coverage, high-capacity backhaul, and emergency/disaster recovery for mobile users (MUs). While the massive benefits brought by SAGIN may improve the quality of service, unauthorized access to SAGIN entities is potentially dangerous. At present, conventional crypto-based authentication is facing challenges, such as the inability to provide continuous and transparent protection for MUs. In this article, we propose an AI-oriented two-phase multi-factor authentication scheme (ATMAS) by introducing intelligence to authentication. The satellite and network control center collaborate on continuous authentication, while unique spatial-temporal features, including service features and geographic features, are utilized to enhance the system security. Our further security analysis and performance evaluations show that ATMAS has proper security characteristics which can meet various security requirements. Moreover, we shed light on lightweight and efficient authentication mechanism design through a proper combination of spatial-temporal factors.
Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots
Authors: Yifan Liu, Tao Hu, Xiaoqing Guan, Yixu Wang, Bixuan Zhang, You Wang, Guang Li
Abstract
Owing to uncertainties in both kinematics and dynamics, the current trajectory tracking framework for mobile robots like spherical robots cannot function effectively on multiple terrains, especially uneven and unknown ones. Since this is a prerequisite for robots to execute tasks in the wild, we enhance our previous hierarchical trajectory tracking framework to handle this issue. First, a modified adaptive RBF neural network (RBFNN) is proposed to represent all uncertainties in kinodynamics. Then the Lyapunov function is utilized to design its adaptive law, and a variable step-size algorithm is employed in the weights update procedure to accelerate convergence and improve stability. Hence, a new adaptive model prediction control-based instruction planner (VAN-MPC) is proposed. Without modifying the bottom controllers, we finally develop the multi-terrain trajectory tracking framework by employing the new instruction planner VAN-MPC. The practical experiments demonstrate its effectiveness and robustness.
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Abstract
We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data scale and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Finally, we show that task or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. These models required over 10,000 GPU-hours to train and can be found on our website for the benefit of the research community.
Keyword: pruning
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Authors: Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.
Keyword: voxel
There is no result
Keyword: lidar
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
Abstract
We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark.
EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection
Abstract
In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method, which converts features of 2D camera view and 3D lidar view to Bird's-Eye-View (BEV) for feature fusion. However, inaccurate depth estimation (e.g. the 'depth jump' problem) is an obstacle to develop LSS-based methods. To alleviate the 'depth jump' problem, we proposed Edge-Aware Bird's-Eye-View (EA-BEV) projector. By coupling proposed edge-aware depth fusion module and depth estimate module, the proposed EA-BEV projector solves the problem and enforces refined supervision on depth. Besides, we propose sparse depth supervision and gradient edge depth supervision, for constraining learning on global depth and local marginal depth information. Our EA-BEV projector is a plug-and-play module for any LSS-based 3D object detection models, and effectively improves the baseline performance. We demonstrate the effectiveness on the nuScenes benchmark. On the nuScenes 3D object detection validation dataset, our proposed EA-BEV projector can boost several state-of-the-art LLS-based baselines on nuScenes 3D object detection benchmark and nuScenes BEV map segmentation benchmark with negligible increment of inference time.
CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
Authors: Ming Yan, Xin Wang, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Motion capture is a long-standing research problem. Although it has been studied for decades, the majority of research focus on ground-based movements such as walking, sitting, dancing, etc. Off-grounded actions such as climbing are largely overlooked. As an important type of action in sports and firefighting field, the climbing movements is challenging to capture because of its complex back poses, intricate human-scene interactions, and difficult global localization. The research community does not have an in-depth understanding of the climbing action due to the lack of specific datasets. To address this limitation, we collect CIMI4D, a large rock \textbf{C}l\textbf{I}mbing \textbf{M}ot\textbf{I}on dataset from 12 persons climbing 13 different climbing walls. The dataset consists of around 180,000 frames of pose inertial measurements, LiDAR point clouds, RGB videos, high-precision static point cloud scenes, and reconstructed scene meshes. Moreover, we frame-wise annotate touch rock holds to facilitate a detailed exploration of human-scene interaction. The core of this dataset is a blending optimization process, which corrects for the pose as it drifts and is affected by the magnetic conditions. To evaluate the merit of CIMI4D, we perform four tasks which include human pose estimations (with/without scene constraints), pose prediction, and pose generation. The experimental results demonstrate that CIMI4D presents great challenges to existing methods and enables extensive research opportunities. We share the dataset with the research community in this http URL
Keyword: diffusion
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
Abstract
We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representation gap between points captured by different sources. Our method handles cross-source data by utilizing multi-grained features and selecting convolution kernel sizes that correspond to most prominent features. Inspired by the diffusion models, our method uses a novel iterative refinement process that gradually shifts the embedding spaces from different sources to a single canonical space for better metric learning. In addition, we present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans. The point clouds in CS-Campus3D have representation gaps and other features like different views, point densities, and noise patterns. We show that our CrossLoc3D algorithm can achieve an improvement of 4.74% - 15.37% in terms of the top 1 average recall on our CS-Campus3D benchmark and achieves performance comparable to state-of-the-art 3D place recognition method on the Oxford RobotCar. We will release the code and CS-Campus3D benchmark.
GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently
Authors: Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate coherent text within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming at endowing image generation models with the capacity to generate images embedded with coherent text. To the best of our knowledge, this is the first work in the field of image synthesis to address the generation of Chinese characters. % we first adopt the OCR technique to collect images with Chinese characters as training samples, and extract the text and locations as auxiliary information. We first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing Chinese characters with the help of glyph and position information. Furthermore, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using a variety of training techniques. Extensive qualitative and quantitative experiments demonstrate that our method not only produces accurate Chinese characters as in prompts, but also naturally blends the generated text into the background. Please refer to https://1073521013.github.io/glyph-draw.github.io
3D-aware Image Generation using 2D Diffusion Models
Abstract
In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. This allows us to utilize 2D diffusion models to boost the generative modeling power of the method. Additionally, we incorporate depth information from monocular depth estimators to construct the training data for the conditional diffusion model using only still images. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods. It produces high-quality images that significantly outperform prior methods. Furthermore, our approach showcases its capability to generate instances with large view angles, even though the training images are diverse and unaligned, gathered from "in-the-wild" real-world environments.
Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models
Authors: Mischa Dombrowski, Hadrien Reynaud, Johanna P. Müller, Matthew Baugh, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The recent progress of diffusion models in terms of image quality has led to a major shift in research related to generative models. Current approaches often fine-tune pre-trained foundation models using domain-specific text-to-image pairs. This approach is straightforward for X-ray image generation due to the high availability of radiology reports linked to specific images. However, current approaches hardly ever look at attention layers to verify whether the models understand what they are generating. In this paper, we discover an important trade-off between image fidelity and interpretability in generative diffusion models. In particular, we show that fine-tuning text-to-image models with learnable text encoder leads to a lack of interpretability of diffusion models. Finally, we demonstrate the interpretability of diffusion models by showing that keeping the language encoder frozen, enables diffusion models to achieve state-of-the-art phrase grounding performance on certain diseases for a challenging multi-label segmentation task, without any additional training. Code and models will be available at https://github.com/MischaD/chest-distillation.
IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection
Authors: Hu Haotian, Wang Fanyi, Su Jingwen, Gao Shiyu, Zhang Zhiwang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D object detection is one of the most important tasks in autonomous driving and robotics. Our research focuses on tackling low efficiency issue of point-based methods on large-scale point clouds. Existing point-based methods adopt farthest point sampling (FPS) strategy for downsampling, which is computationally expensive in terms of inference time and memory consumption when the number of point cloud increases. In order to improve efficiency, we propose a novel Instance-Centroid Faster Point Sampling Module (IC-FPS) , which effectively replaces the first Set Abstraction (SA) layer that is extremely tedious. IC-FPS module is comprised of two methods, local feature diffusion based background point filter (LFDBF) and Centroid-Instance Sampling Strategy (CISS). LFDBF is constructed to exclude most invalid background points, while CISS substitutes FPS strategy by fast sampling centroids and instance points. IC-FPS module can be inserted to almost every point-based models. Extensive experiments on multiple public benchmarks have demonstrated the superiority of IC-FPS. On Waymo dataset, the proposed module significantly improves performance of baseline model and accelerates inference speed by 3.8 times. For the first time, real-time detection of point-based models in large-scale point cloud scenario is realized.
Abstract
Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. Our paper proposes an essentially different framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are progressively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation. Our codes will be made available.
Abstract
The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially utilize diffusion maps to facilitate the diffusion of information among nodes that are adjacent to each other in the feature space. This allows for the diffusion of information between similar points that may not have an adjacent relationship. Next, we utilize graph convolution to further propagate information among adjacent nodes after the diffusion maps, thereby enabling the spread of information among similar nodes that are adjacent in the graph. Finally, we employ the diffusion distances obtained through the use of diffusion maps to regularize and constrain the predicted labels of training nodes. This regularization method is then applied to the HD-GCN training, resulting in a smoother classification surface. The model proposed in this paper effectively overcomes the limitations of information diffusion imposed only by the adjacency matrix. HD-GCN utilizes hybrid diffusion by combining information diffusion between neighborhood nodes in the feature space and adjacent nodes in the adjacency matrix. This method allows for more comprehensive information propagation among nodes, resulting in improved model performance. We evaluated the performance of DM-GCN on three well-known citation network datasets and the results showed that the proposed framework is more effective than several graph-based semi-supervised learning methods.
One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models
Abstract
Adapting a segmentation model from a labeled source domain to a target domain, where a single unlabeled datum is available, is one the most challenging problems in domain adaptation and is otherwise known as one-shot unsupervised domain adaptation (OSUDA). Most of the prior works have addressed the problem by relying on style transfer techniques, where the source images are stylized to have the appearance of the target domain. Departing from the common notion of transferring only the target ``texture'' information, we leverage text-to-image diffusion models (e.g., Stable Diffusion) to generate a synthetic target dataset with photo-realistic images that not only faithfully depict the style of the target domain, but are also characterized by novel scenes in diverse contexts. The text interface in our method Data AugmenTation with diffUsion Models (DATUM) endows us with the possibility of guiding the generation of images towards desired semantic concepts while respecting the original spatial context of a single training image, which is not possible in existing OSUDA methods. Extensive experiments on standard benchmarks show that our DATUM surpasses the state-of-the-art OSUDA methods by up to +7.1%. The implementation is available at https://github.com/yasserben/DATUM
A Closer Look at Parameter-Efficient Tuning in Diffusion Models
Authors: Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Large-scale diffusion models like Stable Diffusion are powerful and find various real-world applications while customizing such models by fine-tuning is both memory and time inefficient. Motivated by the recent progress in natural language processing, we investigate parameter-efficient tuning in large diffusion models by inserting small learnable modules (termed adapters). In particular, we decompose the design space of adapters into orthogonal factors -- the input position, the output position as well as the function form, and perform Analysis of Variance (ANOVA), a classical statistical approach for analyzing the correlation between discrete (design options) and continuous variables (evaluation metrics). Our analysis suggests that the input position of adapters is the critical factor influencing the performance of downstream tasks. Then, we carefully study the choice of the input position, and we find that putting the input position after the cross-attention block can lead to the best performance, validated by additional visualization analyses. Finally, we provide a recipe for parameter-efficient tuning in diffusion models, which is comparable if not superior to the fully fine-tuned baseline (e.g., DreamBooth) with only 0.75 \% extra parameters, across various customized tasks.
$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States
Authors: Sam Bond-Taylor, Chris G. Willcocks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
We introduce $\infty$-Diff, a generative diffusion model which directly operates on infinite resolution data. By randomly sampling subsets of coordinates during training and learning to denoise the content at those coordinates, a continuous function is learned that allows sampling at arbitrary resolutions. In contrast to other recent infinite resolution generative models, our approach operates directly on the raw data, not requiring latent vector compression for context, using hypernetworks, nor relying on discrete components. As such, our approach achieves significantly higher sample quality, as evidenced by lower FID scores, as well as being able to effectively scale to higher resolutions than the training data while retaining detail.
Keyword: dynamic
Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]
Authors: Adrien Banse, Licio Romao, Alessandro Abate, Raphaël M. Jungers
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.
Data-Driven Covariance Steering Control Design
Authors: Joshua Pilipovsky, Panagiotis Tsiotras
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
This paper studies the problem of steering the distribution of a linear time-invariant system from an initial normal distribution to a terminal normal distribution under no knowledge of the system dynamics. This data-driven control framework uses data collected from the input and the state and utilizes the seminal work by Willems et al. to construct a data-based parametrization of the mean and the covariance control problems. These problems are then solved to optimality as convex programs using standard techniques from the covariance control literature. We also discuss the equivalence of indirect and direct data-driven covariance steering designs, as well as a regularized version of the problem that provides a balance between the two. We illustrate the proposed framework through a set of randomized trials on a double integrator system and show that the results match up almost exactly with the corresponding model-based method in the noiseless case. We then analyze the robustness properties of the data-free and data-driven covariance steering methods and demonstrate the trade-offs between performance and optimality among these methods in the presence of data corrupted with exogenous noise.
Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets
Authors: Aiqing Zhu, Tom Bertalan, Beibei Zhu, Yifa Tang, Ioannis G. Kevrekidis
Abstract
We focus on learning hidden dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers. First, we perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation. It is shown that training an ODE-net using an unrolled implicit scheme returns a close approximation of an Inverse Modified Differential Equation (IMDE). In addition, we establish a theoretical basis for hyper-parameter selection when training such ODE-nets, whereas current strategies usually treat numerical integration of ODE-nets as a black box. We thus formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations during the training process, so that the error of the unrolled approximation is less than the current learning loss. This helps accelerate training, while maintaining accuracy. Several numerical experiments are performed to demonstrate the advantages of the proposed algorithm compared to nonadaptive unrollings, and validate the theoretical analysis. We also note that this approach naturally allows for incorporating partially known physical terms in the equations, giving rise to what is termed ``gray box" identification.
MapFormer: Boosting Change Detection by Using Pre-change Information
Authors: Maximilian Bernhard, Niklas Strauß, Matthias Schubert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code will be made publicly available.
The Blockchain Imitation Game
Authors: Kaihua Qin, Stefanos Chaliasos, Liyi Zhou, Benjamin Livshits, Dawn Song, Arthur Gervais
Abstract
The use of blockchains for automated and adversarial trading has become commonplace. However, due to the transparent nature of blockchains, an adversary is able to observe any pending, not-yet-mined transactions, along with their execution logic. This transparency further enables a new type of adversary, which copies and front-runs profitable pending transactions in real-time, yielding significant financial gains. Shedding light on such "copy-paste" malpractice, this paper introduces the Blockchain Imitation Game and proposes a generalized imitation attack methodology called Ape. Leveraging dynamic program analysis techniques, Ape supports the automatic synthesis of adversarial smart contracts. Over a timeframe of one year (1st of August, 2021 to 31st of July, 2022), Ape could have yielded 148.96M USD in profit on Ethereum, and 42.70M USD on BNB Smart Chain (BSC). Not only as a malicious attack, we further show the potential of transaction and contract imitation as a defensive strategy. Within one year, we find that Ape could have successfully imitated 13 and 22 known Decentralized Finance (DeFi) attacks on Ethereum and BSC, respectively. Our findings suggest that blockchain validators can imitate attacks in real-time to prevent intrusions in DeFi.
Information-Theoretic Study of Time-Domain Energy-Saving Techniques in Radio Access
Authors: François Rottenberg
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Reduction of wireless network energy consumption is becoming increasingly important to reduce environmental footprint and operational costs. A key concept to achieve it is the use of lean transmission techniques that dynamically (de)activate hardware resources as a function of the load. In this paper, we propose a pioneering information-theoretic study of time-domain energy-saving techniques, relying on a practical hardware power consumption model of sleep and active modes. By minimizing the power consumption under a quality of service constraint (rate, latency), we propose simple yet powerful techniques to allocate power and choose which resources to activate or to put in sleep mode. Power consumption scaling regimes are identified. We show that a ``rush-to-sleep" approach (maximal power in fewest symbols followed by sleep) is only optimal in a high noise regime. It is shown how consumption can be made linear with the load and achieve massive energy reduction (factor of 10) at low-to-medium load. The trade-off between energy efficiency (EE) and spectral efficiency (SE) is also characterized, followed by a multi-user study based on time division multiple access (TDMA).
How accurate does Newton have to be?
Authors: Carl Christian Kjelgaard Mikkelsen, Lorién López-Villellas, Pablo García-Risueño
Abstract
We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic. In particular, we derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. We briefly discuss implications for parallel solvers.
Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States
Authors: Robert Lefringhausen, Supitsana Srithasan, Armin Lederer, Sandra Hirche
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
As control engineering methods are applied to increasingly complex systems, data-driven approaches for system identification appear as a promising alternative to physics-based modeling. While many of these approaches rely on the availability of state measurements, the states of a complex system are often not directly measurable. It may then be necessary to jointly estimate the dynamics and a latent state, making it considerably more challenging to design controllers with performance guarantees. This paper proposes a novel method for the computation of an optimal input trajectory for unknown nonlinear systems with latent states. Probabilistic performance guarantees are derived for the resulting input trajectory, and an approach to validate the performance of arbitrary control laws is presented. The effectiveness of the proposed method is demonstrated in a numerical simulation.
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Abstract
We propose VDN-NeRF, a method to train neural radiance fields (NeRFs) for better geometry under non-Lambertian surface and dynamic lighting conditions that cause significant variation in the radiance of a point when viewed from different angles. Instead of explicitly modeling the underlying factors that result in the view-dependent phenomenon, which could be complex yet not inclusive, we develop a simple and effective technique that normalizes the view-dependence by distilling invariant information already encoded in the learned NeRFs. We then jointly train NeRFs for view synthesis with view-dependence normalization to attain quality geometry. Our experiments show that even though shape-radiance ambiguity is inevitable, the proposed normalization can minimize its effect on geometry, which essentially aligns the optimal capacity needed for explaining view-dependent variations. Our method applies to various baselines and significantly improves geometry without changing the volume rendering pipeline, even if the data is captured under a moving light source. Code is available at: https://github.com/BoifZ/VDN-NeRF.
Abstract
Parallel robots are capable of high-speed manipulation and have become essential tools in the industry. The proximal placement of their motors and the low weight of their end effectors make them ideal for generating highly dynamic motion. Therefore, parallel robots can be adopted for motion platform designs, as long as end effector loads are low. Traditional motion platforms can be large and powerful to generate multiple g acceleration. However, these designs tend to be expensive and large. Similar but smaller motion platforms feature a small work range with reduced degrees of freedom (DoFs) and a limited payload. Here we seek a medium-sized affordable parallel robot capable of powerful and high-speed 6-DoF motion in a comparably large workspace. This work explores the concept of a quadruped robot flipped upside-down, with the motion platform fixed between its feet. In particular, we exploit the high-power dynamic brushless actuation and the four-leg redundancy when moving the motion platform. We characterize the resulting motion platform by tracking sinusoidal and circular trajectories with varying loads. Dynamic motions in 6 DoFs up to 10 Hz and ~10 mm amplitude are possible when moving a mass of 300 grams. We demonstrate single-axis end-effector translations up to ~20 mm at 10 Hz for higher loads of 1.2 kg. The motion platform can be replicated easily by 3D printing and off-the-shelf components. All motion platform-related hardware and the custom-written software required to replicate are open-source.
Models as Agents: Optimizing Multi-Step Predictions of Interactive Local Models in Model-Based Multi-Agent Reinforcement Learning
Abstract
Research in model-based reinforcement learning has made significant progress in recent years. Compared to single-agent settings, the exponential dimension growth of the joint state-action space in multi-agent systems dramatically increases the complexity of the environment dynamics, which makes it infeasible to learn an accurate global model and thus necessitates the use of agent-wise local models. However, during multi-step model rollouts, the prediction of one local model can affect the predictions of other local models in the next step. As a result, local prediction errors can be propagated to other localities and eventually give rise to considerably large global errors. Furthermore, since the models are generally used to predict for multiple steps, simply minimizing one-step prediction errors regardless of their long-term effect on other models may further aggravate the propagation of local errors. To this end, we propose Models as AGents (MAG), a multi-agent model optimization framework that reversely treats the local models as multi-step decision making agents and the current policies as the dynamics during the model rollout process. In this way, the local models are able to consider the multi-step mutual affect between each other before making predictions. Theoretically, we show that the objective of MAG is approximately equivalent to maximizing a lower bound of the true environment return. Experiments on the challenging StarCraft II benchmark demonstrate the effectiveness of MAG.
Neural Network Entropy (NNetEn): EEG Signals and Chaotic Time Series Separation by Entropy Features, Python Package for NNetEn Calculation
Authors: Andrei Velichko, Maksim Belyaev, Yuriy Izotov, Murugappan Murugappan, Hanif Heidari
Abstract
Entropy measures are effective features for time series classification problems. Traditional entropy measures, such as Shannon entropy, use probability distribution function. However, for the effective separation of time series, new entropy estimation methods are required to characterize the chaotic dynamic of the system. Our concept of Neural Network Entropy (NNetEn) is based on the classification of special datasets (MNIST-10 and SARS-CoV-2-RBV1) in relation to the entropy of the time series recorded in the reservoir of the LogNNet neural network. NNetEn estimates the chaotic dynamics of time series in an original way. Based on the NNetEn algorithm, we propose two new classification metrics: R2 Efficiency and Pearson Efficiency. The efficiency of NNetEn is verified on separation of two chaotic time series of sine mapping using dispersion analysis (ANOVA). For two close dynamic time series (r = 1.1918 and r = 1.2243), the F-ratio has reached the value of 124 and reflects high efficiency of the introduced method in classification problems. The EEG signal classification for healthy persons and patients with Alzheimer disease illustrates the practical application of the NNetEn features. Our computations demonstrate the synergistic effect of increasing classification accuracy when applying traditional entropy measures and the NNetEn concept conjointly. An implementation of the algorithms in Python is presented.
A flatness-based saturated controller design for a quadcopter with experimental validation
Abstract
Using the properties of differential flatness, a controllable system, such as a quadcoper model, may be transformed into a linear equivalent system via a coordinate change and an input mapping. This is a straightforward advantage for the quadcopter's controller design and its real-time implementation. However, one significant hindrance is that, while the dynamics become linear in the new coordinates (the flat output space), the input constraints become convoluted. This paper addresses an explicit pre-stabilization based control scheme which handles the input constraints for the quadcopter in the flat output space with a saturation component. The system's stability is shown to hold by Lyapunov-stability arguments. Moreover, the practical viability of the proposed method is validated both in simulation and experiments over a nano-drone platform. Hence, the flatness-based saturated controller not only ensures stability and constraints satisfaction, but also requires very low computational effort, allowing for embedded implementations.
Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading
Authors: Minglei Lu, Ali Mohammadi, Zhaoxu Meng, Xuhui Meng, Gang Li, Zhen Li
Abstract
Additive manufacturing has been recognized as an industrial technological revolution for manufacturing, which allows fabrication of materials with complex three-dimensional (3D) structures directly from computer-aided design models. The mechanical properties of interpenetrating phase composites (IPCs), especially response to dynamic loading, highly depend on their 3D structures. In general, for each specified structural design, it could take hours or days to perform either finite element analysis (FEA) or experiments to test the mechanical response of IPCs to a given dynamic load. To accelerate the physics-based prediction of mechanical properties of IPCs for various structural designs, we employ a deep neural operator (DNO) to learn the transient response of IPCs under dynamic loading as surrogate of physics-based FEA models. We consider a 3D IPC beam formed by two metals with a ratio of Young's modulus of 2.7, wherein random blocks of constituent materials are used to demonstrate the generality and robustness of the DNO model. To obtain FEA results of IPC properties, 5,000 random time-dependent strain loads generated by a Gaussian process kennel are applied to the 3D IPC beam, and the reaction forces and stress fields inside the IPC beam under various loading are collected. Subsequently, the DNO model is trained using an incremental learning method with sequence-to-sequence training implemented in JAX, leading to a 100X speedup compared to widely used vanilla deep operator network models. After an offline training, the DNO model can act as surrogate of physics-based FEA to predict the transient mechanical response in terms of reaction force and stress distribution of the IPCs to various strain loads in one second at an accuracy of 98%. Also, the learned operator is able to provide extended prediction of the IPC beam subject to longer random strain loads at a reasonably well accuracy.
Inferring networks from time series: a neural approach
Authors: Thomas Gaskin, Grigorios A. Pavliotis, Mark Girolami
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of their emergent dynamics. In this work we present a powerful and fast computational method to infer large network adjacency matrices from time series data using a neural network. Using a neural network provides uncertainty quantification on the prediction in a manner that reflects both the non-convexity of the inference problem as well as the noise on the data. This is useful since network inference problems are typically underdetermined, and a feature that has hitherto been lacking from network inference methods. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from observations of its response to a power cut. Since the problem is underdetermined, many classical statistical tools (e.g. regression) will not be straightforwardly applicable. Our method, in contrast, provides probability densities on each edge, allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the power cut. We also demonstrate our method's ability to learn an entire cost matrix for a non-linear model from a dataset of economic activity in Greater London. Our method outperforms OLS regression on noisy data in terms of both speed and prediction accuracy, and scales as $N^2$ where OLS is cubic. Since our technique is not specifically engineered for network inference, it represents a general parameter estimation scheme that is applicable to any parameter dimension.
Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems
Authors: Robin Herkert, Patrick Buchfink, Bernard Haasdonk
Abstract
Classical model order reduction (MOR) for parametric problems may become computationally inefficient due to large sizes of the required projection bases, especially for problems with slowly decaying Kolmogorov n-widths. Additionally, Hamiltonian structure of dynamical systems may be available and should be preserved during the reduction. In the current presentation, we address these two aspects by proposing a corresponding dictionary-based, online-adaptive MOR approach. The method requires dictionaries for the state-variable, non-linearities and discrete empirical interpolation (DEIM) points. During the online simulation, local basis extensions/simplifications are performed in an online-efficient way, i.e. the runtime complexity of basis modifications and online simulation of the reduced models do not depend on the full state dimension. Experiments on a linear wave equation and a non-linear Sine-Gordon example demonstrate the efficiency of the approach.
Robust LSTM-based Vehicle Velocity Observer for Regular and Near-limits Applications
Authors: Agapius Bou Ghosn, Marcus Nolte, Philip Polack, Arnaud de La Fortelle, Markus Maurer
Subjects: Robotics (cs.RO); Signal Processing (eess.SP)
Abstract
Accurate velocity estimation is key to vehicle control. While the literature describes how model-based and learning-based observers are able to estimate a vehicle's velocity in normal driving conditions, the challenge remains to estimate the velocity in near-limits maneuvers while using only conventional in-car sensors. In this paper, we introduce a novel neural network architecture based on Long Short-Term Memory (LSTM) networks to accurately estimate the vehicle's velocity in different driving conditions, including maneuvers at the limits of handling. The approach has been tested on real vehicle data and it provides more accurate estimations than state-of-the-art model-based and learning-based methods, for both regular and near-limits driving scenarios. Our approach is robust since the performance of the state-of-the-art observers deteriorates with higher dynamics, while our method adapts to different maneuvers, providing accurate estimations even at the vehicle's limits of handling.
Status Updating under Partial Battery Knowledge in Energy Harvesting IoT Networks
Authors: Mohammad Hatami, Markus Leinonen, Marian Codreanu
Abstract
We study status updating under inexact knowledge about the battery levels of the energy harvesting sensors in an IoT network, where users make on-demand requests to a cache-enabled edge node to send updates about various random processes monitored by the sensors. To serve the request(s), the edge node either commands the corresponding sensor to send an update or uses the aged data from the cache. We find a control policy that minimizes the average on-demand AoI subject to per-slot energy harvesting constraints under partial battery knowledge at the edge node. Namely, the edge node is informed about sensors' battery levels only via received status updates, leading to uncertainty about the battery levels for the decision-making. We model the problem as a POMDP which is then reformulated as an equivalent belief-MDP. The belief-MDP in its original form is difficult to solve due to the infinite belief space. However, by exploiting a specific pattern in the evolution of beliefs, we truncate the belief space and develop a dynamic programming algorithm to obtain an optimal policy. Moreover, we address a multi-sensor setup under a transmission limitation for which we develop an asymptotically optimal algorithm. Simulation results assess the performance of the proposed methods.
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
Authors: Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynamic scenes prevents current solutions from achieving better image quality because of the inherent difficulties in aligning and aggregating multiple frames. To tackle these challenges, we model the curvilinear trajectory of pixels analytically and propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixel. Besides, to reconstruct high-quality occlusion frames in dynamic scenes, we present a 3D video architecture that effectively Aligns and Aggregates multi-frame context, namely, RSA^2-Net. We evaluate our method across a broad range of cameras and video sequences, demonstrating its significant superiority. Specifically, our method surpasses the state-of-the-arts by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
Age of Incorrect Information With Hybrid ARQ Under a Resource Constraint for N-ary Symmetric Markov Sources
Authors: Konstantinos Bountrogiannis, Anthony Ephremides, Panagiotis Tsakalides, George Tzagkarakis
Abstract
The Age of Incorrect Information (AoII) is a recently proposed metric for real-time remote monitoring systems. In particular, AoII measures the time the information at the monitor is incorrect, weighted by the magnitude of this incorrectness, thereby combining the notions of freshness and distortion. This paper addresses the definition of an AoII-optimal transmission policy in a discrete-time communication scheme with a resource constraint and a hybrid automatic repeat request (HARQ) protocol. Considering an $N$-ary symmetric Markov source, the problem is formulated as an infinite-horizon average-cost constrained Markov decision process (CMDP). The source model is characterized by the cardinality of the state space and the probability of staying at the same state. Interestingly, it is proved that under some conditions, the optimal transmission policy is to never transmit. This reveals that there exists a region of the source dynamics where communication is inadequate in reducing the AoII. Elsewhere, there exists an optimal transmission policy, which is a randomized mixture of two discrete threshold-based policies that randomize at one state. The optimal threshold and the randomization component are derived analytically. Numerical results illustrate the impact of source dynamics, channel conditions, and the resource constraint on the average AoII.
BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
Abstract
As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.
Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots
Authors: Yifan Liu, Tao Hu, Xiaoqing Guan, Yixu Wang, Bixuan Zhang, You Wang, Guang Li
Abstract
Owing to uncertainties in both kinematics and dynamics, the current trajectory tracking framework for mobile robots like spherical robots cannot function effectively on multiple terrains, especially uneven and unknown ones. Since this is a prerequisite for robots to execute tasks in the wild, we enhance our previous hierarchical trajectory tracking framework to handle this issue. First, a modified adaptive RBF neural network (RBFNN) is proposed to represent all uncertainties in kinodynamics. Then the Lyapunov function is utilized to design its adaptive law, and a variable step-size algorithm is employed in the weights update procedure to accelerate convergence and improve stability. Hence, a new adaptive model prediction control-based instruction planner (VAN-MPC) is proposed. Without modifying the bottom controllers, we finally develop the multi-terrain trajectory tracking framework by employing the new instruction planner VAN-MPC. The practical experiments demonstrate its effectiveness and robustness.
Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process
Authors: Alexander Ororbia
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
We develop a novel credit assignment algorithm for information processing with spiking neurons without requiring feedback synapses. Specifically, we propose an event-driven generalization of the forward-forward and the predictive forward-forward learning processes for a spiking neural system that iteratively processes sensory input over a stimulus window. As a result, the recurrent circuit computes the membrane potential of each neuron in each layer as a function of local bottom-up, top-down, and lateral signals, facilitating a dynamic, layer-wise parallel form of neural computation. Unlike spiking neural coding, which relies on feedback synapses to adjust neural electrical activity, our model operates purely online and forward in time, offering a promising way to learn distributed representations of sensory data patterns with temporal spike signals. Notably, our experimental results on several pattern datasets demonstrate that the even-driven forward-forward (ED-FF) framework works well for training a dynamic recurrent spiking system capable of both classification and reconstruction.
Assessing Language Model Deployment with Risk Cards
Authors: Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad
Abstract
This paper introduces RiskCards, a framework for structured assessment and documentation of risks associated with an application of language models. As with all language, text generated by language models can be harmful, or used to bring about harm. Automating language generation adds both an element of scale and also more subtle or emergent undesirable tendencies to the generated text. Prior work establishes a wide variety of language model harms to many different actors: existing taxonomies identify categories of harms posed by language models; benchmarks establish automated tests of these harms; and documentation standards for models, tasks and datasets encourage transparent reporting. However, there is no risk-centric framework for documenting the complexity of a landscape in which some risks are shared across models and contexts, while others are specific, and where certain conditions may be required for risks to manifest as harms. RiskCards address this methodological gap by providing a generic framework for assessing the use of a given language model in a given scenario. Each RiskCard makes clear the routes for the risk to manifest harm, their placement in harm taxonomies, and example prompt-output pairs. While RiskCards are designed to be open-source, dynamic and participatory, we present a "starter set" of RiskCards taken from a broad literature survey, each of which details a concrete risk presentation. Language model RiskCards initiate a community knowledge base which permits the mapping of risks and harms to a specific model or its application scenario, ultimately contributing to a better, safer and shared understanding of the risk landscape.
Attributed Stream Hypergraphs: temporal modeling of node-attributed high-order interactions
Authors: Andrea Failla, Salvatore Citraro, Giulio Rossetti
Abstract
Recent advances in network science have resulted in two distinct research directions aimed at augmenting and enhancing representations for complex networks. The first direction, that of high-order modeling, aims to focus on connectivity between sets of nodes rather than pairs, whereas the second one, that of feature-rich augmentation, incorporates into a network all those elements that are driven by information which is external to the structure, like node properties or the flow of time. This paper proposes a novel toolbox, that of Attributed Stream Hypergraphs (ASHs), unifying both high-order and feature-rich elements for representing, mining, and analyzing complex networks. Applied to social network analysis, ASHs can characterize complex social phenomena along topological, dynamic and attributive elements. Experiments on real-world face-to-face and online social media interactions highlight that ASHs can easily allow for the analyses, among others, of high-order groups' homophily, nodes' homophily with respect to the hyperedges in which nodes participate, and time-respecting paths between hyperedges.
3D Human Pose Estimation via Intuitive Physics
Authors: Shashank Tripathi, Lea Müller, Chun-Hao P. Huang, Omid Taheri, Michael J. Black, Dimitrios Tzionas
Abstract
Estimating 3D humans from images often produces implausible bodies that lean, float, or penetrate the floor. Such methods ignore the fact that bodies are typically supported by the scene. A physics engine can be used to enforce physical plausibility, but these are not differentiable, rely on unrealistic proxy bodies, and are difficult to integrate into existing optimization and learning frameworks. In contrast, we exploit novel intuitive-physics (IP) terms that can be inferred from a 3D SMPL body interacting with the scene. Inspired by biomechanics, we infer the pressure heatmap on the body, the Center of Pressure (CoP) from the heatmap, and the SMPL body's Center of Mass (CoM). With these, we develop IPMAN, to estimate a 3D body from a color image in a "stable" configuration by encouraging plausible floor contact and overlapping CoP and CoM. Our IP terms are intuitive, easy to implement, fast to compute, differentiable, and can be integrated into existing optimization and regression methods. We evaluate IPMAN on standard datasets and MoYo, a new dataset with synchronized multi-view images, ground-truth 3D bodies with complex poses, body-floor contact, CoM and pressure. IPMAN produces more plausible results than the state of the art, improving accuracy for static poses, while not hurting dynamic ones. Code and data are available for research at https://ipman.is.tue.mpg.de.
Adaptive Sparse Pairwise Loss for Object Re-Identification
Authors: Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Object re-identification (ReID) aims to find instances with the same identity as the given probe from a large gallery. Pairwise losses play an important role in training a strong ReID network. Existing pairwise losses densely exploit each instance as an anchor and sample its triplets in a mini-batch. This dense sampling mechanism inevitably introduces positive pairs that share few visual similarities, which can be harmful to the training. To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks. Based on the proposed loss framework, we propose an adaptive positive mining strategy that can dynamically adapt to diverse intra-class variations. Extensive experiments show that SP loss and its adaptive variant AdaSP loss outperform other pairwise losses, and achieve state-of-the-art performance across several ReID benchmarks. Code is available at https://github.com/Astaxanthin/AdaSP.
Keyword: efficient
Machine learning for discovering laws of nature
Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion Recognition
Scalable High-Quality Hypergraph Partitioning
Mitigating Source Bias for Fairer Weak Supervision
BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware
MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory
Solution of Real Cubic Equations without Cardano's Formula
MLGCN: An Ultra Efficient Graph Convolution Neural Model For 3D Point Cloud Analysis
Pacti: Scaling Assume-Guarantee Reasoning for System Analysis and Design
Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions
SOSR: Source-Free Image Super-Resolution with Wavelet Augmentation Transformer
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text
Neural Microfacet Fields for Inverse Rendering
LabelVizier: Interactive Validation and Relabeling for Technical Text Annotations
AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges
Vision-Assisted mmWave Beam Management for Next-Generation Wireless Systems: Concepts, Solutions and Open Challenges
Exploring the Limits of Deep Image Clustering using Pretrained Models
FP8 versus INT8 for efficient deep learning inference
How to measure research performance of single scientists? A proposal for an index based on scientific prizes: The Prize Winner Index (PWI)
The Many Qualities of a New Directly Accessible Compression Scheme
A data-driven method for parametric PDE Eigenvalue Problems using Gaussian Process with different covariance functions
Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems
Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks
RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving
Direct Data-Driven Computation of Polytopic Robust Control Invariant Sets and State-Feedback Controllers
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
How Efficient Are Today's Continual Learning Algorithms?
A Closer Look at Parameter-Efficient Tuning in Diffusion Models
GVP: Generative Volumetric Primitives
Aerostack2: A Software Framework for Developing Multi-robot Aerial Systems
Keyword: faster
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Scalable High-Quality Hypergraph Partitioning
Task Oriented Conversational Modelling With Subjective Knowledge
BOLT: An Automated Deep Learning Framework for Training and Deploying Large-Scale Neural Networks on Commodity CPU Hardware
Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection
The Many Qualities of a New Directly Accessible Compression Scheme
TPMCF: Temporal QoS Prediction using Multi-Source Collaborative Features
Shipper collaboration matching: fast enumeration of triangular transports with high cooperation effects
Keyword: mobile
A CI-based Auditing Framework for Data Collection Practices
Semi-Weakly Supervised Object Kinematic Motion Prediction
Rethinking Local Perception in Lightweight Vision Transformer
AI-Oriented Two-Phase Multi-Factor Authentication in SAGINs: Prospects and Challenges
Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Keyword: pruning
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Keyword: voxel
There is no result
Keyword: lidar
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection
CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
Keyword: diffusion
CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition
GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently
3D-aware Image Generation using 2D Diffusion Models
Pay Attention: Accuracy Versus Interpretability Trade-off in Fine-tuned Diffusion Models
IC-FPS: Instance-Centroid Faster Point Sampling Module for 3D Point-base Object Detection
Diffusion Action Segmentation
HD-GCN:A Hybrid Diffusion Graph Convolutional Network
One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models
A Closer Look at Parameter-Efficient Tuning in Diffusion Models
$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States
Keyword: dynamic
Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version]
Data-Driven Covariance Steering Control Design
Implementation and (Inverse Modified) Error Analysis for implicitly-templated ODE-nets
MapFormer: Boosting Change Detection by Using Pre-change Information
The Blockchain Imitation Game
Information-Theoretic Study of Time-Domain Energy-Saving Techniques in Radio Access
How accurate does Newton have to be?
Learning-Based Optimal Control with Performance Guarantees for Unknown Systems with Latent States
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Upside down: affordable high-performance motion platform
Models as Agents: Optimizing Multi-Step Predictions of Interactive Local Models in Model-Based Multi-Agent Reinforcement Learning
Neural Network Entropy (NNetEn): EEG Signals and Chaotic Time Series Separation by Entropy Features, Python Package for NNetEn Calculation
A flatness-based saturated controller design for a quadcopter with experimental validation
Deep neural operator for learning transient response of interpenetrating phase composites subject to dynamic loading
Inferring networks from time series: a neural approach
Dictionary-based Online-adaptive Structure-preserving Model Order Reduction for Parametric Hamiltonian Systems
Robust LSTM-based Vehicle Velocity Observer for Regular and Near-limits Applications
Status Updating under Partial Battery Knowledge in Energy Harvesting IoT Networks
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
Age of Incorrect Information With Hybrid ARQ Under a Resource Constraint for N-ary Symmetric Markov Sources
BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
Adaptive Model Prediction Control-Based Multi-Terrain Trajectory Tracking Framework for Mobile Spherical Robots
Learning Spiking Neural Systems with the Event-Driven Forward-Forward Process
Assessing Language Model Deployment with Risk Cards
Attributed Stream Hypergraphs: temporal modeling of node-attributed high-order interactions
3D Human Pose Estimation via Intuitive Physics
Adaptive Sparse Pairwise Loss for Object Re-Identification