Abstract
A new approach to analyzing intrinsic properties of the Josephus function, $J_{k}$, is presented in this paper. The linear structure between extreme points of $J{k}$ is fully revealed, leading to the design of an efficient algorithm for evaluating $J{k}(n)$. Algebraic expressions that describe how recursively compute extreme points, including fixed points, are derived. The existence of consecutive extreme and also fixed points for all $k\geq 2$ is proven as a consequence, which generalizes Knuth result for $k=2$. Moreover, an extensive comparative numerical experiment is conducted to illustrate the performance of the proposed algorithm for evaluating the Josephus function compared to established algorithms. The results show that the proposed scheme is highly effective in computing $J{_k}(n)$ for large inputs.
A Stochastic Method for Solving Time-Fractional Differential Equations
Authors: Nicolas L. Guidotti, Juan Acebrón, José Monteiro
Abstract
We present a stochastic method for efficiently computing the solution of time-fractional partial differential equations (fPDEs) that model anomalous diffusion problems of the subdiffusive type. After discretizing the fPDE in space, the ensuing system of fractional linear equations is solved resorting to a Monte Carlo evaluation of the corresponding Mittag-Leffler matrix function. This is accomplished through the approximation of the expected value of a suitable multiplicative functional of a stochastic process, which consists of a Markov chain whose sojourn times in every state are Mittag-Leffler distributed. The resulting algorithm is able to calculate the solution at conveniently chosen points in the domain with high efficiency. In addition, we present how to generalize this algorithm in order to compute the complete solution. For several large-scale numerical problems, our method showed remarkable performance in both shared-memory and distributed-memory systems, achieving nearly perfect scalability up to 16,384 CPU cores.
Uniform in time convergence of numerical schemes for stochastic differential equations via Strong Exponential stability: Euler methods, Split-Step and Tamed Schemes
Authors: Letizia Angeli, Dan Crisan, Michela Ottobre
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Abstract
We prove a general criterion providing sufficient conditions under which a time-discretiziation of a given Stochastic Differential Equation (SDE) is a uniform in time approximation of the SDE. The criterion is also, to a certain extent, discussed in the paper, necessary. Using such a criterion we then analyse the convergence properties of numerical methods for solutions of SDEs; we consider Explicit and Implicit Euler, split-step and (truncated) tamed Euler methods. In particular, we show that, under mild conditions on the coefficients of the SDE (locally Lipschitz and strictly monotonic), these methods produce approximations of the law of the solution of the SDE that converge uniformly in time. The theoretical results are verified by numerical examples.
Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football
Authors: Chaoyi Gu, Varuna De Silva, Corentin Artaud, Rafael Pina
Abstract
Artificial Intelligence has been used to help human complete difficult tasks in complicated environments by providing optimized strategies for decision-making or replacing the manual labour. In environments including multiple agents, such as football, the most common methods to train agents are Imitation Learning and Multi-Agent Reinforcement Learning (MARL). However, the agents trained by Imitation Learning cannot outperform the expert demonstrator, which makes humans hardly get new insights from the learnt policy. Besides, MARL is prone to the credit assignment problem. In environments with sparse reward signal, this method can be inefficient. The objective of our research is to create a novel reward shaping method by embedding contextual information in reward function to solve the aforementioned challenges. We demonstrate this in the Google Research Football (GRF) environment. We quantify the contextual information extracted from game state observation and use this quantification together with original sparse reward to create the shaped reward. The experiment results in the GRF environment prove that our reward shaping method is a useful addition to state-of-the-art MARL algorithms for training agents in environments with sparse reward signal.
Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis
Authors: Eirik Fladmark, Muhammad Hamza Sajjad, Laura Brinkholm Justesen
Abstract
In this paper, we explore the performance of different pruning methods in the context of the lottery ticket hypothesis. We compare the performance of L1 unstructured pruning, Fisher pruning, and random pruning on different network architectures and pruning scenarios. The experiments include an evaluation of one-shot and iterative pruning, an examination of weight movement in the network during pruning, a comparison of the pruning methods on networks of varying widths, and an analysis of the performance of the methods when the network becomes very sparse. Additionally, we propose and evaluate a new method for efficient computation of Fisher pruning, known as batched Fisher pruning.
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
Authors: Xiuwei Xu, Ziwei Wang, Jie Zhou, Jiwen Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. We empirically observe that sparse convolution operation causes larger quantization errors than standard convolution. However, conventional network quantization methods directly binarize the weights and activations in sparse convolution, resulting in performance drop due to the significant quantization loss. On the contrary, we search the optimal subset of convolution operation that activates the sparse convolution at various locations for quantization error alleviation, and the performance gap between real-valued and binary sparse convolutional networks is closed without complexity overhead. Specifically, we first present the shifted sparse convolution that fuses the information in the receptive field for the active sites that match the pre-defined positions. Then we employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution, and the quantization errors are significantly alleviated for efficient point cloud analysis. For fair evaluation of the proposed method, we empirically select the recently advances that are beneficial for sparse convolution network binarization to construct a strong baseline. The experimental results on Scan-Net and NYU Depth v2 show that our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods by a remarkable margin without additional computation overhead for binarizing sparse convolutional networks.
A Novel Neural Network Approach for Predicting the Arrival Time of Buses for Smart On-Demand Public Transit
Abstract
Among the major public transportation systems in cities, bus transit has its problems, including more accuracy and reliability when estimating the bus arrival time for riders. This can lead to delays and decreased ridership, especially in cities where public transportation is heavily relied upon. A common issue is that the arrival times of buses do not match the schedules, resulting in latency for fixed schedules. According to the study in this paper on New York City bus data, there is an average delay of around eight minutes or 491 seconds mismatch between the bus arrivals and the actual scheduled time. This research paper presents a novel AI-based data-driven approach for estimating the arrival times of buses at each transit point (station). Our approach is based on a fully connected neural network and can predict the arrival time collectively across all bus lines in large metropolitan areas. Our neural-net data-driven approach provides a new way to estimate the arrival time of the buses, which can lead to a more efficient and smarter way to bring the bus transit to the general public. Our evaluation of the network bus system with more than 200 bus lines, and 2 million data points, demonstrates less than 40 seconds of estimated error for arrival times. The inference time per each validation set data point is less than 0.006 ms.
Learning Harmonic Molecular Representations on Riemannian Manifold
Authors: Yiqun Wang, Yuning Shen, Shi Chen, Lihao Wang, Fei Ye, Hao Zhou
Abstract
Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.
A New Index based on Power Splitting Indices for Predicting Proper Time of Controlled Islanding
Abstract
In the event of large disturbances, the practice of controlled islanding is used as a last resort to prevent cascading outages. The application of the strategy at the right time is crucial to maintaining system security. A controlled islanding strategy may be deployed efficiently at the right time by predicting the time of uncontrolled system splitting. The purpose of this study is to predict the appropriate islanding time to prevent catastrophic blackout and uncontrolled islanding based on existing relationships between coherent generator groups. A new instability index is derived from the proximity of inter-area oscillations to power splitting indices. Power splitting indices are derived using synchronization coefficients, which recognize the conditions in the system that warrant controlled islanding. The critical values of indices are calculated in offline mode using simulation data from IEEE 39-Buses, and their online performance is evaluated following a controlled islanding strategy. Through the introduction of these indices, system degradation can be effectively evaluated, and blackouts can be predicted early and prevented by controlled islanding at the right time.
Randomized rounding algorithms for large scale unsplittable flow problems
Abstract
Unsplittable flow problems cover a wide range of telecommunication and transportation problems and their efficient resolution is key to a number of applications. In this work, we study algorithms that can scale up to large graphs and important numbers of commodities. We present and analyze in detail a heuristic based on the linear relaxation of the problem and randomized rounding. We provide empirical evidence that this approach is competitive with state-of-the-art resolution methods either by its scaling performance or by the quality of its solutions. We provide a variation of the heuristic which has the same approximation factor as the state-of-the-art approximation algorithm. We also derive a tighter analysis for the approximation factor of both the variation and the state-of-the-art algorithm. We introduce a new objective function for the unsplittable flow problem and discuss its differences with the classical congestion objective function. Finally, we discuss the gap in practical performance and theoretical guarantees between all the aforementioned algorithms.
Privacy-preserving machine learning for healthcare: open challenges and future perspectives
Authors: Alejandro Guerra-Manzanares, L. Julian Lechuga Lopez, Michail Maniatakos, Farah E. Shamout
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.
Core-Periphery Principle Guided Redesign of Self-Attention in Transformers
Authors: Xiaowei Yu, Lu Zhang, Haixing Dai, Yanjun Lyu, Lin Zhao, Zihao Wu, David Liu, Tianming Liu, Dajiang Zhu
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
Designing more efficient, reliable, and explainable neural network architectures is critical to studies that are based on artificial intelligence (AI) techniques. Previous studies, by post-hoc analysis, have found that the best-performing ANNs surprisingly resemble biological neural networks (BNN), which indicates that ANNs and BNNs may share some common principles to achieve optimal performance in either machine learning or cognitive/behavior tasks. Inspired by this phenomenon, we proactively instill organizational principles of BNNs to guide the redesign of ANNs. We leverage the Core-Periphery (CP) organization, which is widely found in human brain networks, to guide the information communication mechanism in the self-attention of vision transformer (ViT) and name this novel framework as CP-ViT. In CP-ViT, the attention operation between nodes is defined by a sparse graph with a Core-Periphery structure (CP graph), where the core nodes are redesigned and reorganized to play an integrative role and serve as a center for other periphery nodes to exchange information. We evaluated the proposed CP-ViT on multiple public datasets, including medical image datasets (INbreast) and natural image datasets. Interestingly, by incorporating the BNN-derived principle (CP structure) into the redesign of ViT, our CP-ViT outperforms other state-of-the-art ANNs. In general, our work advances the state of the art in three aspects: 1) This work provides novel insights for brain-inspired AI: we can utilize the principles found in BNNs to guide and improve our ANN architecture design; 2) We show that there exist sweet spots of CP graphs that lead to CP-ViTs with significantly improved performance; and 3) The core nodes in CP-ViT correspond to task-related meaningful and important image patches, which can significantly enhance the interpretability of the trained deep model.
Learning Expressive Prompting With Residuals for Vision Transformers
Abstract
Prompt learning is an efficient approach to adapt transformers by inserting learnable set of parameters into the input and intermediate representations of a pre-trained model. In this work, we present Expressive Prompts with Residuals (EXPRES) which modifies the prompt learning paradigm specifically for effective adaptation of vision transformers (ViT). Out method constructs downstream representations via learnable ``output'' tokens, that are akin to the learned class tokens of the ViT. Further for better steering of the downstream representation processed by the frozen transformer, we introduce residual learnable tokens that are added to the output of various computations. We apply EXPRES for image classification, few shot learning, and semantic segmentation, and show our method is capable of achieving state of the art prompt tuning on 3/3 categories of the VTAB benchmark. In addition to strong performance, we observe that our approach is an order of magnitude more prompt efficient than existing visual prompting baselines. We analytically show the computational benefits of our approach over weight space adaptation techniques like finetuning. Lastly we systematically corroborate the architectural design of our method via a series of ablation experiments.
Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator
Abstract
This study proposes a novel framework for learning the underlying physics of phenomena with moving boundaries. The proposed approach combines Ensemble SINDy and Peridynamic Differential Operator (PDDO) and imposes an inductive bias assuming the moving boundary physics evolve in its own corotational coordinate system. The robustness of the approach is demonstrated by considering various levels of noise in the measured data using the 2D Fisher-Stefan model. The confidence intervals of recovered coefficients are listed, and the uncertainties of the moving boundary positions are depicted by obtaining the solutions with the recovered coefficients. Although the main focus of this study is the Fisher-Stefan model, the proposed approach is applicable to any type of moving boundary problem with a smooth moving boundary front without a mushy region. The code and data for this framework is available at: https://github.com/alicanbekar/MB_PDDO-SINDy.
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Authors: Vladislav Lialin, Vijeta Deshpande, Anna Rumshisky
Abstract
This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023. These methods aim to resolve the infeasibility and impracticality of fine-tuning large language models by only training a small set of parameters. We provide a taxonomy that covers a broad range of methods and present a detailed method comparison with a specific focus on real-life efficiency and fine-tuning multibillion-scale language models.
Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing
Authors: Ankita Agarwal (1), Tanvi Banerjee (1), Joy Gockel (2), Saniya LeBlanc (3), Joe Walker (4), John Middendorf (4) ((1) Wright State University, (2) Colorado School of Mines, (3) The George Washington University, (4) Open Additive, LLC)
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Abstract
An additive manufacturing (AM) process, like laser powder bed fusion, allows for the fabrication of objects by spreading and melting powder in layers until a freeform part shape is created. In order to improve the properties of the material involved in the AM process, it is important to predict the material characterization property as a function of the processing conditions. In thermoelectric materials, the power factor is a measure of how efficiently the material can convert heat to electricity. While earlier works have predicted the material characterization properties of different thermoelectric materials using various techniques, implementation of machine learning models to predict the power factor of bismuth telluride (Bi2Te3) during the AM process has not been explored. This is important as Bi2Te3 is a standard material for low temperature applications. Thus, we used data about manufacturing processing parameters involved and in-situ sensor monitoring data collected during AM of Bi2Te3, to train different machine learning models in order to predict its thermoelectric power factor. We implemented supervised machine learning techniques using 80% training and 20% test data and further used the permutation feature importance method to identify important processing parameters and in-situ sensor features which were best at predicting power factor of the material. Ensemble-based methods like random forest, AdaBoost classifier, and bagging classifier performed the best in predicting power factor with the highest accuracy of 90% achieved by the bagging classifier model. Additionally, we found the top 15 processing parameters and in-situ sensor features to characterize the material manufacturing property like power factor. These features could further be optimized to maximize power factor of the thermoelectric material and improve the quality of the products built using this material.
DisWOT: Student Architecture Search for Distillation WithOut Training
Abstract
Knowledge distillation (KD) is an effective training strategy to improve the lightweight student models under the guidance of cumbersome teachers. However, the large architecture difference across the teacher-student pairs limits the distillation gains. In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Secondly, we find that the similarity of feature semantics and sample relations between random-initialized teacher-student networks have good correlations with final distillation performances. Thus, we efficiently measure similarity matrixs conditioned on the semantic activation maps to select the optimal student via an evolutionary algorithm without any training. In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180$\times$ training acceleration. Additionally, we extend similarity metrics in DisWOT as new distillers and KD-based zero-proxies. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces. Our project and code are available at https://lilujunai.github.io/DisWOT-CVPR2023/.
Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation
Authors: Tong Zhao, Andrea Tagliabue, Jonathan P. How
Abstract
The deployment of agile autonomous systems in challenging, unstructured environments requires adaptation capabilities and robustness to uncertainties. Existing robust and adaptive controllers, such as the ones based on MPC, can achieve impressive performance at the cost of heavy online onboard computations. Strategies that efficiently learn robust and onboard-deployable policies from MPC have emerged, but they still lack fundamental adaptation capabilities. In this work, we extend an existing efficient IL algorithm for robust policy learning from MPC with the ability to learn policies that adapt to challenging model/environment uncertainties. The key idea of our approach consists in modifying the IL procedure by conditioning the policy on a learned lower-dimensional model/environment representation that can be efficiently estimated online. We tailor our approach to the task of learning an adaptive position and attitude control policy to track trajectories under challenging disturbances on a multirotor. Our evaluation is performed in a high-fidelity simulation environment and shows that a high-quality adaptive policy can be obtained in about $1.3$ hours. We additionally empirically demonstrate rapid adaptation to in- and out-of-training-distribution uncertainties, achieving a $6.1$ cm average position error under a wind disturbance that corresponds to about $50\%$ of the weight of the robot and that is $36\%$ larger than the maximum wind seen during training.
Distributed Graph Embedding with Information-Oriented Random Walks
Authors: Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, Yuchao Cao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract
Graph embedding maps graph nodes to low-dimensional vectors, and is widely adopted in machine learning tasks. The increasing availability of billion-edge graphs underscores the importance of learning efficient and effective embeddings on large graphs, such as link prediction on Twitter with over one billion edges. Most existing graph embedding methods fall short of reaching high data scalability. In this paper, we present a general-purpose, distributed, information-centric random walk-based graph embedding framework, DistGER, which can scale to embed billion-edge graphs. DistGER incrementally computes information-centric random walks. It further leverages a multi-proximity-aware, streaming, parallel graph partitioning strategy, simultaneously achieving high local partition quality and excellent workload balancing across machines. DistGER also improves the distributed Skip-Gram learning model to generate node embeddings by optimizing the access locality, CPU throughput, and synchronization efficiency. Experiments on real-world graphs demonstrate that compared to state-of-the-art distributed graph embedding frameworks, including KnightKing, DistDGL, and Pytorch-BigGraph, DistGER exhibits 2.33x-129x acceleration, 45% reduction in cross-machines communication, and > 10% effectiveness improvement in downstream tasks.
Design Space Exploration for PCM-based Photonic Memory
Abstract
The integration of silicon photonics (SiPh) and phase change materials (PCMs) has created a unique opportunity to realize adaptable and reconfigurable photonic systems. In particular, the nonvolatile programmability in PCMs has made them a promising candidate for implementing optical memory systems. In this paper, we describe the design of an optical memory cell based on PCMs while exploring the design space of the cell in terms of PCM material choice (e.g., GST, GSST, Sb2Se3), cell bit capacity, latency, and power consumption. Leveraging this design-space exploration for the design of efficient optical memory cells, we present the design and implementation of an optical memory array and explore its scalability and power consumption when using different optical memory cells. We also identify performance bottlenecks that need to be alleviated to further scale optical memory arrays with competitive latency and energy consumption, compared to their electronic counterparts.
HISSbot: Sidewinding with a Soft Snake Robot
Authors: Farhan Rozaidi, Emma Waters, Olivia Dawes, Jennifer Yang, Joseph R. Davidson, Ross L. Hatton
Abstract
Snake robots are characterized by their ability to navigate through small spaces and loose terrain by utilizing efficient cyclic forms of locomotion. Soft snake robots are a subset of these robots which utilize soft, compliant actuators to produce movement. Prior work on soft snake robots has primarily focused on planar gaits, such as undulation. More efficient spatial gaits, such as sidewinding, are unexplored gaits for soft snake robots. We propose a novel means of constructing a soft snake robot capable of sidewinding, and introduce the Helical Inflating Soft Snake Robot (HISSbot). We validate this actuation through the physical HISSbot, and demonstrate its ability to sidewind across various surfaces. Our tests show robustness in locomotion through low-friction and granular media.
Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection
Authors: Tao He, Sheng Huang, Wenhao Tang, Bo Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Scene text detection is a challenging computer vision task due to the high variation in text shapes and ratios. In this work, we propose a scene text detector named Deformable Kernel Expansion (DKE), which incorporates the merits of both segmentation and contour-based detectors. DKE employs a segmentation module to segment the shrunken text region as the text kernel, then expands the text kernel contour to obtain text boundary by regressing the vertex-wise offsets. Generating the text kernel by segmentation enables DKE to inherit the arbitrary-shaped text region modeling capability of segmentation-based detectors. Regressing the kernel contour with some sampled vertices enables DKE to avoid the complicated pixel-level post-processing and better learn contour deformation as the contour-based detectors. Moreover, we propose an Optimal Bipartite Graph Matching Loss (OBGML) that measures the matching error between the predicted contour and the ground truth, which efficiently minimizes the global contour matching distance. Extensive experiments on CTW1500, Total-Text, MSRA-TD500, and ICDAR2015 demonstrate that DKE achieves a good tradeoff between accuracy and efficiency in scene text detection.
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.
A Generalized Ray Formulation For Wave-Optics Rendering
Authors: Shlomi Steinberg, Ravi Ramamoorthi, Benedikt Bitterli, Eugene d'Eon, Ling-Qi Yan, Matt Pharr
Abstract
Under ray-optical light transport, the classical ray serves as a local and linear "point query" of light's behaviour. Such point queries are useful, and sophisticated path tracing and sampling techniques enable efficiently computing solutions to light transport problems in complex, real-world settings and environments. However, such formulations are firmly confined to the realm of ray optics, while many applications of interest, in computer graphics and computational optics, demand a more precise understanding of light. We rigorously formulate the generalized ray, which enables local and linear point queries of the wave-optical phase space. Furthermore, we present sample-solve: a simple method that serves as a novel link between path tracing and computational optics. We will show that this link enables the application of modern path tracing techniques for wave-optical rendering, improving upon the state-of-the-art in terms of the generality and accuracy of the formalism, ease of application, as well as performance. Sampling using generalized rays enables interactive rendering under rigorous wave optics, with orders-of-magnitude faster performance compared to existing techniques.
Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference
Abstract
Throughput-oriented computing via co-running multiple applications in the same machine has been widely adopted to achieve high hardware utilization and energy saving on modern supercomputers and data centers. However, efficiently co-running applications raises new design challenges, mainly because applications with diverse requirements can stress out shared hardware resources (IO, Network and Cache) at various levels. The disparities in resource usage can result in interference, which in turn can lead to unpredictable co-running behaviors. To better understand application interference, prior work provided detailed execution characterization. However, these characterization approaches either emphasize on traditional benchmarks or fall into a single application domain. To address this issue, we study 25 up-to-date applications and benchmarks from various application domains and form 625 consolidation pairs to thoroughly analyze the execution interference caused by application co-running. Moreover, we leverage mini-benchmarks and real applications to pinpoint the provenance of co-running interference in both hardware and software aspects.
TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation
Authors: Xiangyun Meng, Nathan Hatch, Alexander Lambert, Anqi Li, Nolan Wagener, Matthew Schmittle, JoonHo Lee, Wentao Yuan, Zoey Chen, Samuel Deng, Greg Okopal, Dieter Fox, Byron Boots, Amirreza Shaban
Abstract
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Authors: Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP.
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Abstract
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. However, these representations are not efficient enough for an agent to perform actions to arrive the target location. As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability. Specifically, we first retrieve facts (i.e., knowledge described by language descriptions) for the navigation views based on local regions from the constructed knowledge base. The retrieved facts range from properties of a single object (e.g., color, shape) to relationships between objects (e.g., action, spatial position), providing crucial information for VLN. We further present the KERM which contains the purification, fact-aware interaction, and instruction-guided aggregation modules to integrate visual, history, instruction, and fact features. The proposed KERM can automatically select and gather crucial and relevant cues, obtaining more accurate action prediction. Experimental results on the REVERIE, R2R, and SOON datasets demonstrate the effectiveness of the proposed method.
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.
One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Abstract
As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6\% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.
Automated wildlife image classification: An active learning tool for ecological applications
Authors: Ludwig Bothmann, Lisa Wimmer, Omid Charrakh, Tobias Weber, Hendrik Edelhoff, Wibke Peters, Hien Nguyen, Caryl Benjamin, Annette Menzel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
Abstract
Wildlife camera trap images are being used extensively to investigate animal abundance, habitat associations, and behavior, which is complicated by the fact that experts must first classify the images manually. Artificial intelligence systems can take over this task but usually need a large number of already-labeled training images to achieve sufficient performance. This requirement necessitates human expert labor and poses a particular challenge for projects with few cameras or short durations. We propose a label-efficient learning strategy that enables researchers with small or medium-sized image databases to leverage the potential of modern machine learning, thus freeing crucial resources for subsequent analyses. Our methodological proposal is two-fold: (1) We improve current strategies of combining object detection and image classification by tuning the hyperparameters of both models. (2) We provide an active learning (AL) system that allows training deep learning models very efficiently in terms of required human-labeled training images. We supply a software package that enables researchers to use these methods directly and thereby ensure the broad applicability of the proposed framework in ecological practice. We show that our tuning strategy improves predictive performance. We demonstrate how the AL pipeline reduces the amount of pre-labeled data needed to achieve a specific predictive performance and that it is especially valuable for improving out-of-sample predictive performance. We conclude that the combination of tuning and AL increases predictive performance substantially. Furthermore, we argue that our work can broadly impact the community through the ready-to-use software package provided. Finally, the publication of our models tailored to European wildlife data enriches existing model bases mostly trained on data from Africa and North America.
Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes
Abstract
We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how \textit{soft prompt-tuning} -- an NLP technique used to adapt PLMs using small amounts of training data -- compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in \url{https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/}.
GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs
Authors: Yuling Jiao, Di Li, Xiliang Lu, Jerry Zhijian Yang, Cheng Yuan
Abstract
With recent study of the deep learning in scientific computation, the PINNs method has drawn widespread attention for solving PDEs. Compared with traditional methods, PINNs can efficiently handle high-dimensional problems, while the accuracy is relatively low, especially for highly irregular problems. Inspired by the idea of adaptive finite element methods and incremental learning, we propose GAS, a Gaussian mixture distribution-based adaptive sampling method for PINNs. During the training procedure, GAS uses the current residual information to generate a Gaussian mixture distribution for the sampling of additional points, which are then trained together with history data to speed up the convergence of loss and achieve a higher accuracy. Several numerical simulations on 2d to 10d problems show that GAS is a promising method which achieves the state-of-the-art accuracy among deep solvers, while being comparable with traditional numerical solvers.
The Wyner Variational Autoencoder for Unsupervised Multi-Layer Wireless Fingerprinting
Authors: Teng-Hui Huang, Thilini Dahanayaka, Kanchana Thilakarathna, Philip H.W. Leong, Hesham El Gamal
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
Wireless fingerprinting refers to a device identification method leveraging hardware imperfections and wireless channel variations as signatures. Beyond physical layer characteristics, recent studies demonstrated that user behaviours could be identified through network traffic, e.g., packet length, without decryption of the payload. Inspired by these results, we propose a multi-layer fingerprinting framework that jointly considers the multi-layer signatures for improved identification performance. In contrast to previous works, by leveraging the recent multi-view machine learning paradigm, i.e., data with multiple forms, our method can cluster the device information shared among the multi-layer features without supervision. Our information-theoretic approach can be extended to supervised and semi-supervised settings with straightforward derivations. In solving the formulated problem, we obtain a tight surrogate bound using variational inference for efficient optimization. In extracting the shared device information, we develop an algorithm based on the Wyner common information method, enjoying reduced computation complexity as compared to existing approaches. The algorithm can be applied to data distributions belonging to the exponential family class. Empirically, we evaluate the algorithm in a synthetic dataset with real-world video traffic and simulated physical layer characteristics. Our empirical results show that the proposed method outperforms the state-of-the-art baselines in both supervised and unsupervised settings.
Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations
Authors: Marco Caliari, Fabio Cassini, Lukas Einkemmer, Alexander Ostermann
Abstract
In this paper we consider an approach to improve the performance of exponential integrators/Lawson schemes in cases where the solution of a related, but usually much simpler, problem can be computed efficiently. While for implicit methods such an approach is common (e.g. by using preconditioners), for exponential integrators this has proven more challenging. Here we propose to extract a constant coefficient differential operator from advection-diffusion-reaction equations for which we are then able to compute the required matrix functions efficiently. Both a linear stability analysis and numerical experiments show that the resulting schemes can be unconditionally stable. In fact, we find that exponential integrators and Lawson schemes can have better stability properties than similarly constructed implicit-explicit schemes. We also propose new Lawson type integrators that further improve on these stability properties. The effectiveness of the approach is highlighted by a number of numerical examples in two and three space dimensions.
Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning
Authors: Teng-Hui Huang, Hesham El Gamal
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
In this work, we adopt Wyner common information framework for unsupervised multi-view representation learning. Within this framework, we propose two novel formulations that enable the development of computational efficient solvers based on the alternating minimization principle. The first formulation, referred to as the {\em variational form}, enjoys a linearly growing complexity with the number of views and is based on a variational-inference tight surrogate bound coupled with a Lagrangian optimization objective function. The second formulation, i.e., the {\em representational form}, is shown to include known results as special cases. Here, we develop a tailored version from the alternating direction method of multipliers (ADMM) algorithm for solving the resulting non-convex optimization problem. In the two cases, the convergence of the proposed solvers is established in certain relevant regimes. Furthermore, our empirical results demonstrate the effectiveness of the proposed methods as compared with the state-of-the-art solvers. In a nutshell, the proposed solvers offer computational efficiency, theoretical convergence guarantees, scalable complexity with the number of views, and exceptional accuracy as compared with the state-of-the-art techniques. Our focus here is devoted to the discrete case and our results for continuous distributions are reported elsewhere.
STMixer: A One-Stage Sparse Action Detector
Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.
Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation
Authors: Yuhao Cheng, Yichao Yan, Wenhan Zhu, Ye Pan, Bowen Pan, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Head generation with diverse identities is an important task in computer vision and computer graphics, widely used in multimedia applications. However, current full head generation methods require a large number of 3D scans or multi-view images to train the model, resulting in expensive data acquisition cost. To address this issue, we propose Head3D, a method to generate full 3D heads with limited multi-view images. Specifically, our approach first extracts facial priors represented by tri-planes learned in EG3D, a 3D-aware generative model, and then proposes feature distillation to deliver the 3D frontal faces into complete heads without compromising head integrity. To mitigate the domain gap between the face and head models, we present dual-discriminators to guide the frontal and back head generation, respectively. Our model achieves cost-efficient and diverse complete head generation with photo-realistic renderings and high-quality geometry representations. Extensive experiments demonstrate the effectiveness of our proposed Head3D, both qualitatively and quantitatively.
Efficient Quality Diversity Optimization of 3D Buildings through 2D Pre-optimization
Authors: Alexander Hagg, Martin L. Kliemank, Alexander Asteroth, Dominik Wilde, Mario C. Bedrunka, Holger Foysi, Dirk Reith
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
Quality diversity algorithms can be used to efficiently create a diverse set of solutions to inform engineers' intuition. But quality diversity is not efficient in very expensive problems, needing 100.000s of evaluations. Even with the assistance of surrogate models, quality diversity needs 100s or even 1000s of evaluations, which can make it use infeasible. In this study we try to tackle this problem by using a pre-optimization strategy on a lower-dimensional optimization problem and then map the solutions to a higher-dimensional case. For a use case to design buildings that minimize wind nuisance, we show that we can predict flow features around 3D buildings from 2D flow features around building footprints. For a diverse set of building designs, by sampling the space of 2D footprints with a quality diversity algorithm, a predictive model can be trained that is more accurate than when trained on a set of footprints that were selected with a space-filling algorithm like the Sobol sequence. Simulating only 16 buildings in 3D, a set of 1024 building designs with low predicted wind nuisance is created. We show that we can produce better machine learning models by producing training data with quality diversity instead of using common sampling techniques. The method can bootstrap generative design in a computationally expensive 3D domain and allow engineers to sweep the design space, understanding wind nuisance in early design phases.
Mask-Free Video Instance Segmentation
Authors: Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at https://github.com/SysCV/MaskFreeVis.
When Brain-inspired AI Meets AGI
Authors: Lin Zhao, Lu Zhang, Zihao Wu, Yuzhong Chen, Haixing Dai, Xiaowei Yu, Zhengliang Liu, Tuo Zhang, Xintao Hu, Xi Jiang, Xiang Li, Dajiang Zhu, Dinggang Shen, Tianming Liu
Abstract
Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the human brain and seek to replicate its principles in intelligent machines. Brain-inspired artificial intelligence is a field that has emerged from this endeavor, combining insights from neuroscience, psychology, and computer science to develop more efficient and powerful AI systems. In this article, we provide a comprehensive overview of brain-inspired AI from the perspective of AGI. We begin with the current progress in brain-inspired AI and its extensive connection with AGI. We then cover the important characteristics for both human intelligence and AGI (e.g., scaling, multimodality, and reasoning). We discuss important technologies toward achieving AGI in current AI systems, such as in-context learning and prompt tuning. We also investigate the evolution of AGI systems from both algorithmic and infrastructural perspectives. Finally, we explore the limitations and future of AGI.
A source separation approach to temporal graph modelling for computer networks
Authors: Corentin Larroche
Subjects: Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
Abstract
Detecting malicious activity within an enterprise computer network can be framed as a temporal link prediction task: given a sequence of graphs representing communications between hosts over time, the goal is to predict which edges should--or should not--occur in the future. However, standard temporal link prediction algorithms are ill-suited for computer network monitoring as they do not take account of the peculiar short-term dynamics of computer network activity, which exhibits sharp seasonal variations. In order to build a better model, we propose a source separation-inspired description of computer network activity: at each time step, the observed graph is a mixture of subgraphs representing various sources of activity, and short-term dynamics result from changes in the mixing coefficients. Both qualitative and quantitative experiments demonstrate the validity of our approach.
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks
Abstract
The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL), allowing multiple client devices to offload substantial training workloads to an edge server via layer-wise model split. By observing that existing PSL schemes incur excessive training latency and large volume of data transmissions, we propose an innovative PSL framework, namely, efficient parallel split learning (EPSL), to accelerate model training. To be specific, EPSL parallelizes client-side model training and reduces the dimension of local gradients for back propagation (BP) via last-layer gradient aggregation, leading to a significant reduction in server-side training and communication latency. Moreover, by considering the heterogeneous channel conditions and computing capabilities at client devices, we jointly optimize subchannel allocation, power control, and cut layer selection to minimize the per-round latency. Simulation results show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy compared with the state-of-the-art benchmarks, and the tailored resource management and layer split strategy can considerably reduce latency than the counterpart without optimization.
A Survey on Malware Detection with Graph Representation Learning
Authors: Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha, Anis Zouaoui
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper.
Understanding and Exploring the Whole Set of Good Sparse Generalized Additive Models
Abstract
In real applications, interaction between machine learning model and domain experts is critical; however, the classical machine learning paradigm that usually produces only a single model does not facilitate such interaction. Approximating and exploring the Rashomon set, i.e., the set of all near-optimal models, addresses this practical challenge by providing the user with a searchable space containing a diverse set of models from which domain experts can choose. We present a technique to efficiently and accurately approximate the Rashomon set of sparse, generalized additive models (GAMs). We present algorithms to approximate the Rashomon set of GAMs with ellipsoids for fixed support sets and use these ellipsoids to approximate Rashomon sets for many different support sets. The approximated Rashomon set serves as a cornerstone to solve practical challenges such as (1) studying the variable importance for the model class; (2) finding models under user-specified constraints (monotonicity, direct editing); (3) investigating sudden changes in the shape functions. Experiments demonstrate the fidelity of the approximated Rashomon set and its effectiveness in solving practical challenges.
Simulation-based Inference for Model Parameterization on Analog Neuromorphic Hardware
Authors: Jakob Kaiser, Raphael Stock, Eric Müller, Johannes Schemmel, Sebastian Schmitt
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
The BrainScaleS-2 (BSS-2) system implements physical models of neurons as well as synapses and aims for an energy-efficient and fast emulation of biological neurons. When replicating neuroscientific experiment results, a major challenge is finding suitable model parameters. This study investigates the suitability of the sequential neural posterior estimation (SNPE) algorithm for parameterizing a multi-compartmental neuron model emulated on the BSS-2 analog neuromorphic hardware system. In contrast to other optimization methods such as genetic algorithms or stochastic searches, the SNPE algorithms belongs to the class of approximate Bayesian computing (ABC) methods and estimates the posterior distribution of the model parameters; access to the posterior allows classifying the confidence in parameter estimations and unveiling correlation between model parameters. In previous applications, the SNPE algorithm showed a higher computational efficiency than traditional ABC methods. For our multi-compartmental model, we show that the approximated posterior is in agreement with experimental observations and that the identified correlation between parameters is in agreement with theoretical expectations. Furthermore, we show that the algorithm can deal with high-dimensional observations and parameter spaces. These results suggest that the SNPE algorithm is a promising approach for automating the parameterization of complex models, especially when dealing with characteristic properties of analog neuromorphic substrates, such as trial-to-trial variations or limited parameter ranges.
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Abstract
Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks. The code and models will be released at https://github.com/OpenGVLab/unmasked_teacher.
Efficient solutions to the relative pose of three calibrated cameras from four points using virtual correspondences
Authors: Charalambos Tzamos, Daniel Barath, Torsten Sattler, Zuzana Kukelova
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We study the challenging problem of estimating the relative pose of three calibrated cameras. We propose two novel solutions to the notoriously difficult configuration of four points in three views, known as the 4p3v problem. Our solutions are based on the simple idea of generating one additional virtual point correspondence in two views by using the information from the locations of the four input correspondences in the three views. For the first solver, we train a network to predict this point correspondence. The second solver uses a much simpler and more efficient strategy based on the mean points of three corresponding input points. The new solvers are efficient and easy to implement since they are based on the existing efficient minimal solvers, i.e., the well-known 5-point relative pose and the P3P solvers. The solvers achieve state-of-the-art results on real data. The idea of solving minimal problems using virtual correspondences is general and can be applied to other problems, e.g., the 5-point relative pose problem. In this way, minimal problems can be solved using simpler non-minimal solvers or even using sub-minimal samples inside RANSAC. In addition, we compare different variants of 4p3v solvers with the baseline solver for the minimal configuration consisting of three triplets of points and two points visible in two views. We discuss which configuration of points is potentially the most practical in real applications.
Abstract
Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.
Variational Distribution Learning for Unsupervised Text-to-Image Generation
Authors: Minsoo Kang, Doyup Lee, Jiseob Kim, Saehoon Kim, Bohyung Han
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. In this work, instead of simply generating pseudo-ground-truth sentences of training images using existing image captioning methods, we employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space and, consequently, works well on zero-shot recognition tasks. We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings. To better align data in the two domains, we employ a principled way based on a variational inference, which efficiently estimates an approximate posterior of the hidden text embedding given an image and its CLIP feature. Experimental results validate that the proposed framework outperforms existing approaches by large margins under unsupervised and semi-supervised text-to-image generation settings.
Multimodal Manoeuvre and Trajectory Prediction for Autonomous Vehicles Using Transformer Networks
Abstract
Predicting the behaviour (i.e. manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a. automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using two public benchmark highway driving datasets, namely NGSIM and highD. The results show that the proposed framework outperforms the state-of-the-art multimodal methods in the literature in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.
DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets
Abstract
Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.
Dias: Dynamic Rewriting of Pandas Code
Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
Abstract
In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.
What Writing Assistants Can Learn from Programming IDEs
Abstract
With the development of artificial intelligence, writing assistants (WAs) are changing the way people interact with text, creating lengthy outputs that can be overwhelming for users. The programming field has long addressed this issue, and Integrated Development Environments (IDEs) have been created for efficient software development, helping programmers reduce the cognitive load. This experience could be employed in the development of WAs. IDEs can also be used to test assumptions about interventions that help people interact with WAs efficiently. Previous works have successfully used self-written IDE plugins to test hypotheses in the field of human-computer interaction. The lessons learned can be applied to the building of WAs.
Learning Federated Visual Prompt in Null Space for MRI Reconstruction
Authors: Chun-Mei Feng Bangjun Li Xinxing Xu, Yong Liu, Huazhu Fu Wangmeng Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy. However, the data heterogeneity caused by different MRI protocols, insufficient local training data, and limited communication bandwidth inevitably impair global model convergence and updating. In this paper, we propose a new algorithm, FedPR, to learn federated visual prompts in the null space of global prompt for MRI reconstruction. FedPR is a new federated paradigm that adopts a powerful pre-trained model while only learning and communicating the prompts with few learnable parameters, thereby significantly reducing communication costs and achieving competitive performance on limited local data. Moreover, to deal with catastrophic forgetting caused by data heterogeneity, FedPR also updates efficient federated visual prompts that project the local prompts into an approximate null space of the global prompt, thereby suppressing the interference of gradients on the server performance. Extensive experiments on federated MRI show that FedPR significantly outperforms state-of-the-art FL algorithms with <6% of communication costs when given the limited amount of local training data.
VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis
Abstract
With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached an unprecedented level. Compared to traditional mesh-based assets, this volumetric representation is more powerful in expressing scene geometry but inevitably suffers from high rendering costs and can hardly be involved in further processes like editing, posing significant difficulties in combination with the existing graphics pipeline. In this paper, we present a hybrid volume-mesh representation, VMesh, which depicts an object with a textured mesh along with an auxiliary sparse volume. VMesh retains the advantages of mesh-based assets, such as efficient rendering, compact storage, and easy editing, while also incorporating the ability to represent subtle geometric structures provided by the volumetric counterpart. VMesh can be obtained from multi-view images of an object and renders at 2K 60FPS on common consumer devices with high fidelity, unleashing new opportunities for real-time immersive applications.
Large-scale Training Data Search for Object Re-identification
Authors: Yue Yao, Huan Lei, Tom Gedeon, Liang Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80\% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.
Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection
Abstract
Anomaly detectors are widely used in industrial production to detect and localize unknown defects in query images. These detectors are trained on nominal images and have shown success in distinguishing anomalies from most normal samples. However, hard-nominal examples are scattered and far apart from most normalities, they are often mistaken for anomalies by existing anomaly detectors. To address this problem, we propose a simple yet efficient method: \textbf{H}ard Nominal \textbf{E}xample-aware \textbf{T}emplate \textbf{M}utual \textbf{M}atching (HETMM). Specifically, \textit{HETMM} aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies, yielding fewer false-positive and missed-detection rates. Moreover, \textit{HETMM} mutually explores the anomalies in two directions between queries and the template set, and thus it is capable to capture the logical anomalies. This is a significant advantage over most anomaly detectors that frequently fail to detect logical anomalies. Additionally, to meet the speed-accuracy demands, we further propose \textbf{P}ixel-level \textbf{T}emplate \textbf{S}election (PTS) to streamline the original template set. \textit{PTS} selects cluster centres and hard-nominal examples to form a tiny set, maintaining the original decision boundaries. Comprehensive experiments on five real-world datasets demonstrate that our methods yield outperformance than existing advances under the real-time inference speed. Furthermore, \textit{HETMM} can be hot-updated by inserting novel samples, which may promptly address some incremental learning issues.
When to be critical? Performance and evolvability in different regimes of neural Ising agents
Authors: Sina Khajehabdollahi, Jan Prosi, Georg Martius, Anna Levina
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Abstract
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter.
Keyword: faster
A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics
Authors: Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin Zhang, Ming Tao, Jie Liu
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
Abstract
This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.
Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems
Abstract
Thermal Energy Storage (TES) devices, which leverage the constant-temperature thermal capacity of the latent heat of a Phase Change Material (PCM), provide benefits to a variety of thermal management systems by decoupling the absorption and rejection of thermal energy. While performing a role similar to a battery in an electrical system, it is critical to know when to charge (freeze) and discharge (melt) the TES to maximize the capabilities and efficiency of the overall system. Therefore, control-oriented models of TES are needed to predict the behavior of the TES and make informed control decisions. While existing modeling approaches divide the TES in to multiple sections using a Fixed Grid (FG) approach, this paper proposes a switched Moving Boundary (MB) model that captures the key dynamics of the TES with significantly fewer dynamic states. Specifically, a graph-based modeling approach is used to model the heat flow through the TES and a MB approach is used to model the time-varying liquid and solid regions of the TES. Additionally, a Finite State Machine (FSM) is used to switch between four different modes of operation based on the State-of-Charge (SOC) of the TES. Numerical simulations comparing the proposed approach with a more traditional FG approach show that the MB model is capable of accurately modeling the behavior of the FG model while using far fewer states, leading to five times faster simulations.
Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
Authors: Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality. To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise.
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.
A Generalized Ray Formulation For Wave-Optics Rendering
Authors: Shlomi Steinberg, Ravi Ramamoorthi, Benedikt Bitterli, Eugene d'Eon, Ling-Qi Yan, Matt Pharr
Abstract
Under ray-optical light transport, the classical ray serves as a local and linear "point query" of light's behaviour. Such point queries are useful, and sophisticated path tracing and sampling techniques enable efficiently computing solutions to light transport problems in complex, real-world settings and environments. However, such formulations are firmly confined to the realm of ray optics, while many applications of interest, in computer graphics and computational optics, demand a more precise understanding of light. We rigorously formulate the generalized ray, which enables local and linear point queries of the wave-optical phase space. Furthermore, we present sample-solve: a simple method that serves as a novel link between path tracing and computational optics. We will show that this link enables the application of modern path tracing techniques for wave-optical rendering, improving upon the state-of-the-art in terms of the generality and accuracy of the formalism, ease of application, as well as performance. Sampling using generalized rays enables interactive rendering under rigorous wave optics, with orders-of-magnitude faster performance compared to existing techniques.
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Authors: Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.
Clustered Federated Learning Architecture for Network Anomaly Detection in Large Scale Heterogeneous IoT Networks
Authors: Xabier Sáez-de-Cámara, Jose Luis Flores, Cristóbal Arellano, Aitor Urbieta, Urko Zurutuza
Abstract
There is a growing trend of cyberattacks against Internet of Things (IoT) devices; moreover, the sophistication and motivation of those attacks is increasing. The vast scale of IoT, diverse hardware and software, and being typically placed in uncontrolled environments make traditional IT security mechanisms such as signature-based intrusion detection and prevention systems challenging to integrate. They also struggle to cope with the rapidly evolving IoT threat landscape due to long delays between the analysis and publication of the detection rules. Machine learning methods have shown faster response to emerging threats; however, model training architectures like cloud or edge computing face multiple drawbacks in IoT settings, including network overhead and data isolation arising from the large scale and heterogeneity that characterizes these networks. This work presents an architecture for training unsupervised models for network intrusion detection in large, distributed IoT and Industrial IoT (IIoT) deployments. We leverage Federated Learning (FL) to collaboratively train between peers and reduce isolation and network overhead problems. We build upon it to include an unsupervised device clustering algorithm fully integrated into the FL pipeline to address the heterogeneity issues that arise in FL settings. The architecture is implemented and evaluated using a testbed that includes various emulated IoT/IIoT devices and attackers interacting in a complex network topology comprising 100 emulated devices, 30 switches and 10 routers. The anomaly detection models are evaluated on real attacks performed by the testbed's threat actors, including the entire Mirai malware lifecycle, an additional botnet based on the Merlin command and control server and other red-teaming tools performing scanning activities and multiple attacks targeting the emulated devices.
Faster Deterministic Distributed MIS and Approximate Matching
Authors: Mohsen Ghaffari, Christoph Grunau
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
$ \renewcommand{\tilde}{\widetilde} $We present an $\tilde{O}(\log^2 n)$ round deterministic distributed algorithm for the maximal independent set problem. By known reductions, this round complexity extends also to maximal matching, $\Delta+1$ vertex coloring, and $2\Delta-1$ edge coloring. These four problems are among the most central problems in distributed graph algorithms and have been studied extensively for the past four decades. This improved round complexity comes closer to the $\tilde{\Omega}(\log n)$ lower bound of maximal independent set and maximal matching [Balliu et al. FOCS '19]. The previous best known deterministic complexity for all of these problems was $\Theta(\log^3 n)$. Via the shattering technique, the improvement permeates also to the corresponding randomized complexities, e.g., the new randomized complexity of $\Delta+1$ vertex coloring is now $\tilde{O}(\log^2\log n)$ rounds. Our approach is a novel combination of the previously known two methods for developing deterministic algorithms for these problems, namely global derandomization via network decomposition (see e.g., [Rozhon, Ghaffari STOC'20; Ghaffari, Grunau, Rozhon SODA'21; Ghaffari et al. SODA'23]) and local rounding of fractional solutions (see e.g., [Fischer DISC'17; Harris FOCS'19; Fischer, Ghaffari, Kuhn FOCS'17; Ghaffari, Kuhn FOCS'21; Faour et al. SODA'23]). We consider a relaxation of the classic network decomposition concept, where instead of requiring the clusters in the same block to be non-adjacent, we allow each node to have a small number of neighboring clusters. We also show a deterministic algorithm that computes this relaxed decomposition faster than standard decompositions. We then use this relaxed decomposition to significantly improve the integrality of certain fractional solutions, before handing them to the local rounding procedure that now has to do fewer rounding steps.
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Abstract
Video Foundation Models (VFMs) have received limited exploration due to high computational costs and data scarcity. Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain. Although VideoMAE has trained a robust ViT from limited data, its low-level reconstruction poses convergence difficulties and conflicts with high-level cross-modal alignment. This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. To increase data efficiency, we mask out most of the low-semantics video tokens, but selectively align the unmasked tokens with IFM, which serves as the UnMasked Teacher (UMT). By providing semantic guidance, our method enables faster convergence and multimodal friendliness. With a progressive pre-training framework, our model can handle various tasks including scene-related, temporal-related, and complex video-language understanding. Using only public sources for pre-training in 6 days on 32 A100 GPUs, our scratch-built ViT-L/16 achieves state-of-the-art performances on various video tasks. The code and models will be released at https://github.com/OpenGVLab/unmasked_teacher.
Neural Collapse Inspired Federated Learning with Non-iid Data
Authors: Chenxi Huang, Liang Xie, Yibo Yang, Wenxiao Wang, Binbin Lin, Deng Cai
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
One of the challenges in federated learning is the non-independent and identically distributed (non-iid) characteristics between heterogeneous devices, which cause significant differences in local updates and affect the performance of the central server. Although many studies have been proposed to address this challenge, they only focus on local training and aggregation processes to smooth the changes and fail to achieve high performance with deep learning models. Inspired by the phenomenon of neural collapse, we force each client to be optimized toward an optimal global structure for classification. Specifically, we initialize it as a random simplex Equiangular Tight Frame (ETF) and fix it as the unit optimization target of all clients during the local updating. After guaranteeing all clients are learning to converge to the global optimum, we propose to add a global memory vector for each category to remedy the parameter fluctuation caused by the bias of the intra-class condition distribution among clients. Our experimental results show that our method can improve the performance with faster convergence speed on different-size datasets.
Lazy learning: a biologically-inspired plasticity rule for fast and energy efficient synaptic plasticity
Authors: Aaron Pache, Mark CW van Rossum
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Abstract
When training neural networks for classification tasks with backpropagation, parameters are updated on every trial, even if the sample is classified correctly. In contrast, humans concentrate their learning effort on errors. Inspired by human learning, we introduce lazy learning, which only learns on incorrect samples. Lazy learning can be implemented in a few lines of code and requires no hyperparameter tuning. Lazy learning achieves state-of-the-art performance and is particularly suited when datasets are large. For instance, it reaches 99.2% test accuracy on Extended MNIST using a single-layer MLP, and does so 7.6x faster than a matched backprop network
DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets
Abstract
Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.
Dias: Dynamic Rewriting of Pandas Code
Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
Abstract
In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.
Keyword: mobile
Beyond Accuracy: A Critical Review of Fairness in Machine Learning for Mobile and Wearable Computing
Abstract
The field of mobile, wearable, and ubiquitous computing (UbiComp) is undergoing a revolutionary integration of machine learning. Devices can now diagnose diseases, predict heart irregularities, and unlock the full potential of human cognition. However, the underlying algorithms are not immune to biases with respect to sensitive attributes (e.g., gender, race), leading to discriminatory outcomes. The research communities of HCI and AI-Ethics have recently started to explore ways of reporting information about datasets to surface and, eventually, counter those biases. The goal of this work is to explore the extent to which the UbiComp community has adopted such ways of reporting and highlight potential shortcomings. Through a systematic review of papers published in the Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) journal over the past 5 years (2018-2022), we found that progress on algorithmic fairness within the UbiComp community lags behind. Our findings show that only a small portion (5%) of published papers adheres to modern fairness reporting, while the overwhelming majority thereof focuses on accuracy or error metrics. In light of these findings, our work provides practical guidelines for the design and development of ubiquitous technologies that not only strive for accuracy but also for fairness.
Overcoming Probabilistic Faults in Disoriented Linear Search
Abstract
We consider search by mobile agents for a hidden, idle target, placed on the infinite line. Feasible solutions are agent trajectories in which all agents reach the target sooner or later. A special feature of our problem is that the agents are $p$-faulty, meaning that every attempt to change direction is an independent Bernoulli trial with known probability $p$, where $p$ is the probability that a turn fails. We are looking for agent trajectories that minimize the worst-case expected termination time, relative to competitive analysis. First, we study linear search with one deterministic $p$-faulty agent, i.e., with no access to random oracles, $p\in (0,1/2)$. For this problem, we provide trajectories that leverage the probabilistic faults into an algorithmic advantage. Our strongest result pertains to a search algorithm (deterministic, aside from the adversarial probabilistic faults) which, as $p\to 0$, has optimal performance $4.59112+\epsilon$, up to the additive term $\epsilon$ that can be arbitrarily small. Additionally, it has performance less than $9$ for $p\leq 0.390388$. When $p\to 1/2$, our algorithm has performance $\Theta(1/(1-2p))$, which we also show is optimal up to a constant factor. Second, we consider linear search with two $p$-faulty agents, $p\in (0,1/2)$, for which we provide three algorithms of different advantages, all with a bounded competitive ratio even as $p\rightarrow 1/2$. Indeed, for this problem, we show how the agents can simulate the trajectory of any $0$-faulty agent (deterministic or randomized), independently of the underlying communication model. As a result, searching with two agents allows for a solution with a competitive ratio of $9+\epsilon$, or a competitive ratio of $4.59112+\epsilon$. Our final contribution is a novel algorithm for searching with two $p$-faulty agents that achieves a competitive ratio $3+4\sqrt{p(1-p)}$.
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Authors: Xiao Yang, Chang Liu, Longlong Xu, Yikai Wang, Yinpeng Dong, Ning Chen, Hang Su, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weaknesses of face recognition systems and evaluate their robustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial recognition systems. The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. It requires that this technique can simultaneously deceive black-box recognition models and evade defensive mechanisms. To fulfill this, we design adversarial textured 3D meshes (AT3D) with an elaborate topology on a human face, which can be 3D-printed and pasted on the attacker's face to evade the defenses. However, the mesh-based optimization regime calculates gradients in high-dimensional mesh space, and can be trapped into local optima with unsatisfactory transferability. To deviate from the mesh-based space, we propose to perturb the low-dimensional coefficient space based on 3D Morphable Model, which significantly improves black-box transferability meanwhile enjoying faster search efficiency and better visual quality. Extensive experiments in digital and physical scenarios show that our method effectively explores the security vulnerabilities of multiple popular commercial services, including three recognition APIs, four anti-spoofing APIs, two prevailing mobile phones and two automated access control systems.
A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes
Authors: Angelo Feraudo, Alessando Calvio, Armir Bujari, Paolo Bellavista
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
IoT and edge computing are profoundly changing the information era, bringing a hyper-connected and context-aware computing environment to reality. Connected vehicles are a critical outcome of this synergy, allowing for the seamless interconnection of autonomous mobile/fixed objects, giving rise to a decentralized vehicle-to-everything (V2X) paradigm. On this front, the European Telecommunications Standards Institute (ETSI) proposed the Multi-Access Edge Computing (MEC) standard, addressing the execution of cloud-like services at the very edge of the infrastructure, thus facilitating the support of low-latency services at the far-edge. In this article, we go a step further and propose a novel ETSI MEC-compliant architecture that fully exploits the synergies between the edge and far-edge, extending the pool of virtualized resources available at MEC nodes with vehicular ones found in the vicinity. In particular, our approach allows vehicle entities to access and partake in a negotiation process embodying a rewarding scheme, while addressing resource volatility as vehicles join and leave the resource pool. To demonstrate the viability and flexibility of our proposed approach, we have built an ETSI MEC-compliant simulation model, which could be tailored to distribute application requests based on the availability of both local and remote resources, managing their transparent migration and execution. In addition, the paper reports on the experimental validation of our proposal in a 5G network setting, contrasting different service delivery modes, by highlighting the potential of the dynamic exploitation of far-edge vehicular resources.
4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images
Authors: Zhuoran Zheng, Xiuyi Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Currently, mobile and IoT devices are in dire need of a series of methods to enhance 4K images with limited resource expenditure. The absence of large-scale 4K benchmark datasets hampers progress in this area, especially for dehazing. The challenges in building ultra-high-definition (UHD) dehazing datasets are the absence of estimation methods for UHD depth maps, high-quality 4K depth estimation datasets, and migration strategies for UHD haze images from synthetic to real domains. To address these problems, we develop a novel synthetic method to simulate 4K hazy images (including nighttime and daytime scenes) from clear images, which first estimates the scene depth, simulates the light rays and object reflectance, then migrates the synthetic images to real domains by using a GAN, and finally yields the hazy effects on 4K resolution images. We wrap these synthesized images into a benchmark called the 4K-HAZE dataset. Specifically, we design the CS-Mixer (an MLP-based model that integrates \textbf{C}hannel domain and \textbf{S}patial domain) to estimate the depth map of 4K clear images, the GU-Net to migrate a 4K synthetic image to the real hazy domain. The most appealing aspect of our approach (depth estimation and domain migration) is the capability to run a 4K image on a single GPU with 24G RAM in real-time (33fps). Additionally, this work presents an objective assessment of several state-of-the-art single-image dehazing methods that are evaluated using the 4K-HAZE dataset. At the end of the paper, we discuss the limitations of the 4K-HAZE dataset and its social implications.
Around-Body Interaction: Leveraging Limb Movements for Interacting in a Digitally Augmented Physical World
Abstract
Recent technological advances have made head-mounted displays (HMDs) smaller and untethered, fostering the vision of ubiquitous interaction with information in a digitally augmented physical world. For interacting with such devices, three main types of input - besides not very intuitive finger gestures - have emerged so far: 1) Touch input on the frame of the devices or 2) on accessories (controller) as well as 3) voice input. While these techniques have both advantages and disadvantages depending on the current situation of the user, they largely ignore the skills and dexterity that we show when interacting with the real world: Throughout our lives, we have trained extensively to use our limbs to interact with and manipulate the physical world around us. This thesis explores how the skills and dexterity of our upper and lower limbs, acquired and trained in interacting with the real world, can be transferred to the interaction with HMDs. Thus, this thesis develops the vision of around-body interaction, in which we use the space around our body, defined by the reach of our limbs, for fast, accurate, and enjoyable interaction with such devices. This work contributes four interaction techniques, two for the upper limbs and two for the lower limbs: The first contribution shows how the proximity between our head and hand can be used to interact with HMDs. The second contribution extends the interaction with the upper limbs to multiple users and illustrates how the registration of augmented information in the real world can support cooperative use cases. The third contribution shifts the focus to the lower limbs and discusses how foot taps can be leveraged as an input modality for HMDs. The fourth contribution presents how lateral shifts of the walking path can be exploited for mobile and hands-free interaction with HMDs while walking.
Ranking mobility and impact inequality in early academic careers
Authors: Ye Sun, Fabio Caccioli, Giacomo Livan
Subjects: Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
Abstract
How difficult is it for an early career academic to climb the ranks of their discipline? We tackle this question with a comprehensive bibliometric analysis of 57 disciplines, examining the publications of more than 5 million authors whose careers started between 1986 and 2008. We calibrate a simple random walk model over historical data of ranking mobility, which we use to (1) identify which strata of academic impact rankings are the most/least mobile and (2) study the temporal evolution of mobility. By focusing our analysis on cohorts of authors starting their careers in the same year, we find that ranking mobility is remarkably low for the top and bottom-ranked authors, and that this excess of stability persists throughout the entire period of our analysis. We further observe that mobility of impact rankings has increased over time, and that such rise has been accompanied by a decline of impact inequality, which is consistent with the negative correlation that we observe between such two quantities. These findings provide clarity on the opportunities of new scholars entering the academic community, with implications for academic policymaking.
Inside-out Infrared Marker Tracking via Head Mounted Displays for Smart Robot Programming
Authors: David Puljiz, Alexandru-George Vasilache, Michael Mende, Björn Hein
Abstract
Intuitive robot programming through use of tracked smart input devices relies on fixed, external tracking systems, most often employing infra-red markers. Such an approach is frequently combined with projector-based augmented reality for better visualisation and interface. The combined system, although providing an intuitive programming platform with short cycle times even for inexperienced users, is immobile, expensive and requires extensive calibration. When faced with a changing environment and large number of robots it becomes sorely impractical. Here we present our work on infra-red marker tracking using the Microsoft HoloLens head-mounted display. The HoloLens can map the environment, register the robot on-line, and track smart devices equipped with infra-red markers in the robot coordinate system. We envision our work to provide the basis to transfer many of the paradigms developed over the years for systems requiring a projector and a tracked input device into a highly-portable system that does not require any calibration or special set-up. We test the quality of the marker-tracking in an industrial robot cell and compare our tracking with a ground truth obtained via an ART-3 tracking system.
Evolutionary Design of the Memory Subsystem
Authors: Josefa Díaz Álvarez, José L. Risco-Martín, J. Manuel Colmenar
Abstract
The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory intensive. This increases the pressure on the memory subsystem and affects the performance and energy consumption. In this regard, the thermal problems, performance degradation and high energy consumption, can cause irreversible damage to the devices. We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. Firstly, the thermal impact of register file is analyzed and optimized. Secondly, the cache memory is addressed by optimizing cache configuration according to running applications and improving both performance and power consumption. Finally, we simplify the design and evaluation process of general-purpose and customized dynamic memory manager, in the main memory. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. This way, we are able to evaluate the quality of each candidate solution and take advantage of the exploration of solutions given by the optimization algorithm.We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
Abstract
Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be used to enable the execution of natural language processing (NLP) inference on mobile systems-on-chip housing custom hardware accelerators. However, while these existing solutions are effective in alleviating the latency, energy, and area costs of running single NLP tasks, achieving multi-task inference requires running computations over multiple variants of the model parameters, which are tailored to each of the targeted tasks. This approach leads to either prohibitive on-chip memory requirements or paying the cost of off-chip memory access. This paper proposes adapter-ALBERT, an efficient model optimization for maximal data reuse across different tasks. The proposed model's performance and robustness to data compression methods are evaluated across several language tasks from the GLUE benchmark. Additionally, we demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator to extrapolate performance, power, and area improvements over the execution of a traditional ALBERT model on the same hardware platform.
Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services
Authors: Minrui Xu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, Zhu Han, Abbas Jamalipour, Dong In Kim, Xuemin (Sherman)Shen, Victor C. M. Leung, H. Vincent Poor
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Artificial Intelligence-Generated Content (AIGC) is an automated method for generating, manipulating, and modifying valuable and diverse data using AI algorithms creatively. This survey paper focuses on the deployment of AIGC applications, e.g., ChatGPT and Dall-E, at mobile edge networks, namely mobile AIGC networks, that provide personalized and customized AIGC services in real time while maintaining user privacy. We begin by introducing the background and fundamentals of generative models and the lifecycle of AIGC services at mobile AIGC networks, which includes data collection, training, finetuning, inference, and product management. We then discuss the collaborative cloud-edge-mobile infrastructure and technologies required to support AIGC services and enable users to access AIGC at mobile edge networks. Furthermore, we explore AIGCdriven creative applications and use cases for mobile AIGC networks. Additionally, we discuss the implementation, security, and privacy challenges of deploying mobile AIGC networks. Finally, we highlight some future research directions and open issues for the full realization of mobile AIGC networks.
Keyword: pruning
Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis
Authors: Eirik Fladmark, Muhammad Hamza Sajjad, Laura Brinkholm Justesen
Abstract
In this paper, we explore the performance of different pruning methods in the context of the lottery ticket hypothesis. We compare the performance of L1 unstructured pruning, Fisher pruning, and random pruning on different network architectures and pruning scenarios. The experiments include an evaluation of one-shot and iterative pruning, an examination of weight movement in the network during pruning, a comparison of the pruning methods on networks of varying widths, and an analysis of the performance of the methods when the network becomes very sparse. Additionally, we propose and evaluate a new method for efficient computation of Fisher pruning, known as batched Fisher pruning.
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
Authors: Xinyi Ye, Weiyue Zhao, Hao Lu, Zhiguo Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Correspondence pruning aims to search consistent correspondences (inliers) from a set of putative correspondences. It is challenging because of the disorganized spatial distribution of numerous outliers, especially when putative correspondences are largely dominated by outliers. It's more challenging to ensure effectiveness while maintaining efficiency. In this paper, we propose an effective and efficient method for correspondence pruning. Inspired by the success of attentive context in correspondence problems, we first extend the attentive context to the first-order attentive context and then introduce the idea of attention in attention (ANA) to model second-order attentive context for correspondence pruning. Compared with first-order attention that focuses on feature-consistent context, second-order attention dedicates to attention weights itself and provides an additional source to encode consistent context from the attention map. For efficiency, we derive two approximate formulations for the naive implementation of second-order attention to optimize the cubic complexity to linear complexity, such that second-order attention can be used with negligible computational overheads. We further implement our formulations in a second-order context layer and then incorporate the layer in an ANA block. Extensive experiments demonstrate that our method is effective and efficient in pruning outliers, especially in high-outlier-ratio cases. Compared with the state-of-the-art correspondence pruning approach LMCNet, our method runs 14 times faster while maintaining a competitive accuracy.
Randomly Initialized Subnetworks with Iterative Weight Recycling
Abstract
The Multi-Prize Lottery Ticket Hypothesis posits that randomly initialized neural networks contain several subnetworks that achieve comparable accuracy to fully trained models of the same architecture. However, current methods require that the network is sufficiently overparameterized. In this work, we propose a modification to two state-of-the-art algorithms (Edge-Popup and Biprop) that finds high-accuracy subnetworks with no additional storage cost or scaling. The algorithm, Iterative Weight Recycling, identifies subsets of important weights within a randomly initialized network for intra-layer reuse. Empirically we show improvements on smaller network architectures and higher prune rates, finding that model sparsity can be increased through the "recycling" of existing weights. In addition to Iterative Weight Recycling, we complement the Multi-Prize Lottery Ticket Hypothesis with a reciprocal finding: high-accuracy, randomly initialized subnetwork's produce diverse masks, despite being generated with the same hyperparameter's and pruning strategy. We explore the landscapes of these masks, which show high variability.
Large-scale Training Data Search for Object Re-identification
Authors: Yue Yao, Huan Lei, Tom Gedeon, Liang Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80\% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.
Keyword: voxel
Multimodal and multicontrast image fusion via deep generative models
Authors: Giovanna Maria Dimitri, Simeon Spasov, Andrea Duggento, Luca Passamonti, Pietro Li`o, Nicola Toschi
Abstract
Recently, it has become progressively more evident that classic diagnostic labels are unable to reliably describe the complexity and variability of several clinical phenotypes. This is particularly true for a broad range of neuropsychiatric illnesses (e.g., depression, anxiety disorders, behavioral phenotypes). Patient heterogeneity can be better described by grouping individuals into novel categories based on empirically derived sections of intersecting continua that span across and beyond traditional categorical borders. In this context, neuroimaging data carry a wealth of spatiotemporally resolved information about each patient's brain. However, they are usually heavily collapsed a priori through procedures which are not learned as part of model training, and consequently not optimized for the downstream prediction task. This is because every individual participant usually comes with multiple whole-brain 3D imaging modalities often accompanied by a deep genotypic and phenotypic characterization, hence posing formidable computational challenges. In this paper we design a deep learning architecture based on generative models rooted in a modular approach and separable convolutional blocks to a) fuse multiple 3D neuroimaging modalities on a voxel-wise level, b) convert them into informative latent embeddings through heavy dimensionality reduction, c) maintain good generalizability and minimal information loss. As proof of concept, we test our architecture on the well characterized Human Connectome Project database demonstrating that our latent embeddings can be clustered into easily separable subject strata which, in turn, map to different phenotypical information which was not included in the embedding creation process. This may be of aid in predicting disease evolution as well as drug response, hence supporting mechanistic disease understanding and empowering clinical trials.
LinK: Linear Kernel for LiDAR-based 3D Perception
Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.
Keyword: lidar
4D Panoptic Segmentation as Invariant and Equivariant Field Prediction
Authors: Minghan Zhu, Shizong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari Jadidi, Fatih Porikli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation. 4D panoptic segmentation is a recently established benchmark task for autonomous driving, which requires recognizing semantic classes and object instances on the road based on LiDAR scans, as well as assigning temporally consistent IDs to instances across time. We observe that the driving scenario is symmetric to rotations on the ground plane. Therefore, rotation-equivariance could provide better generalization and more robust feature learning. Specifically, we review the object instance clustering strategies, and restate the centerness-based approach and the offset-based approach as the prediction of invariant scalar fields and equivariant vector fields. Other sub-tasks are also unified from this perspective, and different invariant and equivariant layers are designed to facilitate their predictions. Through evaluation on the standard 4D panoptic segmentation benchmark of SemanticKITTI, we show that our equivariant models achieve higher accuracy with lower computational costs compared to their non-equivariant counterparts. Moreover, our method sets the new state-of-the-art performance and achieves 1st place on the SemanticKITTI 4D Panoptic Segmentation leaderboard.
LinK: Linear Kernel for LiDAR-based 3D Perception
Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.
Keyword: diffusion
An efficient method for the anisotropic diffusion equation in magnetic fields
Authors: Dean Muir, Kenneth Duru, Matthew Hole, Stuart Hudson
Abstract
We solve the anisotropic diffusion equation in 2D, where the dominant direction of diffusion is defined by a vector field which does not conform to a Cartesian grid. Our method uses operator splitting to separate the diffusion perpendicular and parallel to the vector field. The slow time scale is solved using a provably stable finite difference formulation in the perpendicular to the vector field, and an integral operator for the diffusion parallel to it. Energy estimates are shown to for the continuous and semi-discrete cases. Numerical experiments are performed showing convergence of the method, and examples is given to demonstrate the capabilities of the method.
A Stochastic Method for Solving Time-Fractional Differential Equations
Authors: Nicolas L. Guidotti, Juan Acebrón, José Monteiro
Abstract
We present a stochastic method for efficiently computing the solution of time-fractional partial differential equations (fPDEs) that model anomalous diffusion problems of the subdiffusive type. After discretizing the fPDE in space, the ensuing system of fractional linear equations is solved resorting to a Monte Carlo evaluation of the corresponding Mittag-Leffler matrix function. This is accomplished through the approximation of the expected value of a suitable multiplicative functional of a stochastic process, which consists of a Markov chain whose sojourn times in every state are Mittag-Leffler distributed. The resulting algorithm is able to calculate the solution at conveniently chosen points in the domain with high efficiency. In addition, we present how to generalize this algorithm in order to compute the complete solution. For several large-scale numerical problems, our method showed remarkable performance in both shared-memory and distributed-memory systems, achieving nearly perfect scalability up to 16,384 CPU cores.
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Authors: Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.
Ecosystem Graphs: The Social Footprint of Foundation Models
Authors: Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
Abstract
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.
Structure Preserving Finite Volume Approximation of Cross-Diffusion Systems Coupled by a Free Interface
Authors: Clément Cancès, Jean Cauvin-Vila, Claire Chainais-Hillairet, Virginie Ehrlacher
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
We propose a two-point flux approximation finite-volume scheme for the approximation of two cross-diffusion systems coupled by a free interface to account for vapor deposition. The moving interface is addressed with a cut-cell approach, where the mesh is locally deformed around the interface. The scheme preserves the structure of the continuous system, namely: mass conservation, nonnegativity, volume-filling constraints and decay of the free energy. Numerical results illustrate the properties of the scheme.
Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations
Authors: Marco Caliari, Fabio Cassini, Lukas Einkemmer, Alexander Ostermann
Abstract
In this paper we consider an approach to improve the performance of exponential integrators/Lawson schemes in cases where the solution of a related, but usually much simpler, problem can be computed efficiently. While for implicit methods such an approach is common (e.g. by using preconditioners), for exponential integrators this has proven more challenging. Here we propose to extract a constant coefficient differential operator from advection-diffusion-reaction equations for which we are then able to compute the required matrix functions efficiently. Both a linear stability analysis and numerical experiments show that the resulting schemes can be unconditionally stable. In fact, we find that exponential integrators and Lawson schemes can have better stability properties than similarly constructed implicit-explicit schemes. We also propose new Lawson type integrators that further improve on these stability properties. The effectiveness of the approach is highlighted by a number of numerical examples in two and three space dimensions.
Visual Chain-of-Thought Diffusion Models
Authors: William Harvey, Frank Wood
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation.
Your Diffusion Model is Secretly a Zero-Shot Classifier
Authors: Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, Deepak Pathak
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
Abstract
The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive compositional generalization abilities. Almost all use cases thus far have solely focused on sampling; however, diffusion models can also provide conditional density estimates, which are useful for tasks beyond image generation. In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. Our generative approach to classification attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models. We also find that our diffusion-based approach has stronger multimodal relational reasoning abilities than competing contrastive approaches. Finally, we evaluate diffusion models trained on ImageNet and find that they approach the performance of SOTA discriminative classifiers trained on the same dataset, even with weak augmentations and no regularization. Results and visualizations at https://diffusion-classifier.github.io/
Keyword: dynamic
A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics
Authors: Zhuoying Zhao, Ziling Tan, Pinghui Mo, Xiaonan Wang, Dan Zhao, Xin Zhang, Ming Tao, Jie Liu
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
Abstract
This paper proposes a special-purpose system to achieve high-accuracy and high-efficiency machine learning (ML) molecular dynamics (MD) calculations. The system consists of field programmable gate array (FPGA) and application specific integrated circuit (ASIC) working in heterogeneous parallelization. To be specific, a multiplication-less neural network (NN) is deployed on the non-von Neumann (NvN)-based ASIC (SilTerra 180 nm process) to evaluate atomic forces, which is the most computationally expensive part of MD. All other calculations of MD are done using FPGA (Xilinx XC7Z100). It is shown that, to achieve similar-level accuracy, the proposed NvN-based system based on low-end fabrication technologies (180 nm) is 1.6x faster and 10^2-10^3x more energy efficiency than state-of-the-art vN based MLMD using graphics processing units (GPUs) based on much more advanced technologies (12 nm), indicating superiority of the proposed NvN-based heterogeneous parallel architecture.
Sequential training of GANs against GAN-classifiers reveals correlated "knowledge gaps" present among independently trained GAN instances
Authors: Arkanath Pathak, Nicholas Dufour
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Modern Generative Adversarial Networks (GANs) generate realistic images remarkably well. Previous work has demonstrated the feasibility of "GAN-classifiers" that are distinct from the co-trained discriminator, and operate on images generated from a frozen GAN. That such classifiers work at all affirms the existence of "knowledge gaps" (out-of-distribution artifacts across samples) present in GAN training. We iteratively train GAN-classifiers and train GANs that "fool" the classifiers (in an attempt to fill the knowledge gaps), and examine the effect on GAN training dynamics, output quality, and GAN-classifier generalization. We investigate two settings, a small DCGAN architecture trained on low dimensional images (MNIST), and StyleGAN2, a SOTA GAN architecture trained on high dimensional images (FFHQ). We find that the DCGAN is unable to effectively fool a held-out GAN-classifier without compromising the output quality. However, StyleGAN2 can fool held-out classifiers with no change in output quality, and this effect persists over multiple rounds of GAN/classifier training which appears to reveal an ordering over optima in the generator parameter space. Finally, we study different classifier architectures and show that the architecture of the GAN-classifier has a strong influence on the set of its learned artifacts.
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
Abstract
We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space to a disentangled canonical space from all the control parameters. We then leverage the 3D-aware GAN framework (EG3D) to synthesize detailed shape and appearance of 3D full heads in the canonical space, followed by a volume rendering step guided by the volumetric correspondence map to output into the observation space. To ensure the control accuracy on the synthesized head shapes and expressions, we introduce a geometry prior loss to conform to head SDF and a control loss to conform to the expression code. Further, we enhance the temporal realism with dynamic details conditioned upon varying expressions and joint poses. Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods both qualitatively and quantitatively. We also provide an ablation study to justify many of our system design choices.
Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator
Abstract
This study proposes a novel framework for learning the underlying physics of phenomena with moving boundaries. The proposed approach combines Ensemble SINDy and Peridynamic Differential Operator (PDDO) and imposes an inductive bias assuming the moving boundary physics evolve in its own corotational coordinate system. The robustness of the approach is demonstrated by considering various levels of noise in the measured data using the 2D Fisher-Stefan model. The confidence intervals of recovered coefficients are listed, and the uncertainties of the moving boundary positions are depicted by obtaining the solutions with the recovered coefficients. Although the main focus of this study is the Fisher-Stefan model, the proposed approach is applicable to any type of moving boundary problem with a smooth moving boundary front without a mushy region. The code and data for this framework is available at: https://github.com/alicanbekar/MB_PDDO-SINDy.
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
Authors: Rashmi Ranjan Bhuyan, Adel Javanmard, Sungchul Kim, Gourab Mukherjee, Ryan A. Rossi, Tong Yu, Handong Zhao
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Abstract
We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic probit model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up.
GNN-based physics solver for time-independent PDEs
Authors: Rini Jasmine Gladstone, Helia Rahmani, Vishvas Suryakumar, Hadi Meidani, Marta D'Elia, Ahmad Zareei
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Abstract
Physics-based deep learning frameworks have shown to be effective in accurately modeling the dynamics of complex physical systems with generalization capability across problem inputs. However, time-independent problems pose the challenge of requiring long-range exchange of information across the computational domain for obtaining accurate predictions. In the context of graph neural networks (GNNs), this calls for deeper networks, which, in turn, may compromise or slow down the training process. In this work, we present two GNN architectures to overcome this challenge - the Edge Augmented GNN and the Multi-GNN. We show that both these networks perform significantly better (by a factor of 1.5 to 2) than baseline methods when applied to time-independent solid mechanics problems. Furthermore, the proposed architectures generalize well to unseen domains, boundary conditions, and materials. Here, the treatment of variable domains is facilitated by a novel coordinate transformation that enables rotation and translation invariance. By broadening the range of problems that neural operators based on graph neural networks can tackle, this paper provides the groundwork for their application to complex scientific and industrial settings.
Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems
Abstract
Thermal Energy Storage (TES) devices, which leverage the constant-temperature thermal capacity of the latent heat of a Phase Change Material (PCM), provide benefits to a variety of thermal management systems by decoupling the absorption and rejection of thermal energy. While performing a role similar to a battery in an electrical system, it is critical to know when to charge (freeze) and discharge (melt) the TES to maximize the capabilities and efficiency of the overall system. Therefore, control-oriented models of TES are needed to predict the behavior of the TES and make informed control decisions. While existing modeling approaches divide the TES in to multiple sections using a Fixed Grid (FG) approach, this paper proposes a switched Moving Boundary (MB) model that captures the key dynamics of the TES with significantly fewer dynamic states. Specifically, a graph-based modeling approach is used to model the heat flow through the TES and a MB approach is used to model the time-varying liquid and solid regions of the TES. Additionally, a Finite State Machine (FSM) is used to switch between four different modes of operation based on the State-of-Charge (SOC) of the TES. Numerical simulations comparing the proposed approach with a more traditional FG approach show that the MB model is capable of accurately modeling the behavior of the FG model while using far fewer states, leading to five times faster simulations.
Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays
Abstract
In discrete-event systems, to save sensor resources, the agent continuously adjusts sensor activation decisions according to a sensor activation policy based on the changing observations. However, new challenges arise for sensor activations in networked discrete-event systems, where observation delays and control delays exist between the sensor systems and the agent. In this paper, a new framework for activating sensors in networked discrete-event systems is established. In this framework, we construct a communication automaton that explicitly expresses the interaction process between the agent and the sensor systems over the observation channel and the control channel. Based on the communication automaton, we can define dynamic observations of a communicated string. To guarantee that a sensor activation policy is physically implementable and insensitive to random control delays and observation delays, we further introduce the definition of delay feasibility. We show that a delay feasible sensor activation policy can be used to dynamically activate sensors even if control delays and observation delays exist. A set of algorithms are developed to minimize sensor activations in a transition-based domain while ensuring a given specification condition is satisfied. A practical example is provided to show the application of the developed sensor activation methods. Finally, we briefly discuss how to extend the proposed framework to a decentralized sensing architecture.
Cesno: Possibility of Creating a New Programming Language
Authors: Ozelot Vanilla, Jingxiang Yu, Hemn Barzan Abdalla, Haozhe Cui
Abstract
Programming languages are incredibly versatile, enabling developers to create applications and programs that suit their individual requirements. This article introduces a new language called Cesno, designed from the ground up to offer an advanced, user-friendly, and easy-to-use programming environment. Cesno's syntax is similar to other popular languages, making it simple to learn and work with. It incorporates features from other languages, such as syntactic sugar, a built-in library, support for functional programming, object-oriented programming, dynamic typing, a type system, and a variety of function parameters and restrictions. This article will explore the design of Cesno's grammar, provide a brief overview of how Cesno processes and compiles code, and provide examples of what Cesno's code looks like and how it can aid in development.
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Authors: Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically integrates the guidance of the target text by utilizing text-relevant spatial and channel-wise attentions during vertex feature extraction, resulting in more accurate attribute prediction and faster convergence speed. Furthermore, existing works lack standard benchmarks and automated metrics for evaluation, often relying on subjective and non-reproducible user studies to assess the quality of stylized 3D assets. To overcome this limitation, we introduce a new standard text-mesh benchmark, namely MIT-30, and two automated metrics, which will enable future research to achieve fair and objective comparisons. Our extensive qualitative and quantitative experiments demonstrate that X-Mesh outperforms previous state-of-the-art methods.
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
Abstract
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.
A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes
Authors: Angelo Feraudo, Alessando Calvio, Armir Bujari, Paolo Bellavista
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
IoT and edge computing are profoundly changing the information era, bringing a hyper-connected and context-aware computing environment to reality. Connected vehicles are a critical outcome of this synergy, allowing for the seamless interconnection of autonomous mobile/fixed objects, giving rise to a decentralized vehicle-to-everything (V2X) paradigm. On this front, the European Telecommunications Standards Institute (ETSI) proposed the Multi-Access Edge Computing (MEC) standard, addressing the execution of cloud-like services at the very edge of the infrastructure, thus facilitating the support of low-latency services at the far-edge. In this article, we go a step further and propose a novel ETSI MEC-compliant architecture that fully exploits the synergies between the edge and far-edge, extending the pool of virtualized resources available at MEC nodes with vehicular ones found in the vicinity. In particular, our approach allows vehicle entities to access and partake in a negotiation process embodying a rewarding scheme, while addressing resource volatility as vehicles join and leave the resource pool. To demonstrate the viability and flexibility of our proposed approach, we have built an ETSI MEC-compliant simulation model, which could be tailored to distribute application requests based on the availability of both local and remote resources, managing their transparent migration and execution. In addition, the paper reports on the experimental validation of our proposal in a 5G network setting, contrasting different service delivery modes, by highlighting the potential of the dynamic exploitation of far-edge vehicular resources.
Obstacle Avoidance in Dynamic Environments via Tunnel-following MPC with Adaptive Guiding Vector Fields
Abstract
This paper proposes a motion control scheme for robots operating in a dynamic environment with concave obstacles. A Model Predictive Controller (MPC) is constructed to drive the robot towards a goal position while ensuring collision avoidance without direct use of obstacle information in the optimization problem. This is achieved by guaranteeing tracking performance of an appropriately designed receding horizon path. The path is computed using a guiding vector field defined in a subspace of the free workspace where each point in the subspace satisfies a criteria for minimum distance to all obstacles. The effectiveness of the control scheme is illustrated by means of simulation.
Control Barrier Functions in Dynamic UAVs for Kinematic Obstacle Avoidance: A Collision Cone Approach
Abstract
Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. The proposed approach employs control barrier functions and utilizes collision cones to ensure that the quadrotor's velocity and the obstacle's velocity always point away from each other. In particular, we propose a new constraint formulation that ensures that the relative velocity between the quadrotor and the obstacle always avoids a cone of vectors that may lead to a collision. By showing that the proposed constraint is a valid control barrier function (CBFs) for quadrotors, we are able to leverage on its real-time implementation via Quadratic Programs (QPs), called the CBF-QPs. We validate the effectiveness of the proposed CBF-QPs by demonstrating collision avoidance with moving obstacles under multiple scenarios. This is shown in the pybullet simulator.Furthermore we compare the proposed approach with CBF-QPs shown in literature, especially the well-known higher order CBF-QPs (HO-CBF-QPs), where in we show that it is more conservative compared to the proposed approach. This comparison also shown in simulation in detail.
Satellite Dynamics Toolbox Library: a tool to model multi-body space systems for robust control synthesis and analysis
Authors: Francesco Sanfedino, Daniel Alazard, Ervan Kassarian, Franca Somers
Abstract
The level of maturity reached by robust control theory techniques nowadays contributes to a considerable minimization of the development time of an end-to-end control design of a spacecraft system. The advantage offered by this framework is twofold: all system uncertainties can be included from the very beginning of the design process; the validation and verification (V\&V) process is improved by fast detection of worst-case configurations that could escape to a classical sample-based Monte Carlo simulation campaign. Before proceeding to the control synthesis and analysis, a proper uncertain plant model has to be available in order to push these techniques to their limits of performance. In this spirit, the Satellite Dynamics Toolbox Library (SDTlib) offers many features to model a spacecraft system in a multi-body fashion on SIMULINK. Parametric models can be easily built in a Linear Fractional Transformation (LFT) form by including uncertainties and varying parameters with minimal number of repetitions. Uncertain Linear Time Invariant (LTI) and uncertain Linear Parameter-Varying (LPV) controllers can then be synthesized and analyzed in a straightforward way. The authors present in this article a tutorial, that can be downloaded at https://nextcloud.isae.fr/index.php/s/XDfRfHntejHTmmp, to show how to deal with an end-to-end robust design of a spacecraft mission and to provide to researchers a benchmark to test their own algorithms.
STMixer: A One-Stage Sparse Action Detector
Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.
On Optimal Synchronization of Diffusively Coupled Heterogeneous Van der Pol Oscillators
Authors: Tabea Trummel, Zonglin Liu, Olaf Stursberg
Subjects: Systems and Control (eess.SY); Chaotic Dynamics (nlin.CD)
Abstract
This paper proposes a novel method to achieve and preserve synchronization for a set of connected heterogeneous Van der Pol oscillators. Unlike the state-of-the-art synchronization methods, in which a large coupling gain is applied to couple any pair of connected oscillators, the proposed method first casts the whole synchronization process into two phases. The first one considers the period from the beginning to the first instant of synchronization, while the second phase covers the following time in which synchronization must be preserved. It is shown that a large coupling gain is adopted for the first phase, while the averaged coupling gain to preserve the synchronization in the second phase can be reduced significantly by using an offline optimized coupling law. Efficiency and performance of this method are confirmed by a set of numerical tests with different graphs and system dynamics.
ARMP: Autoregressive Motion Planning for Quadruped Locomotion and Navigation in Complex Indoor Environments
Abstract
Generating natural and physically feasible motions for legged robots has been a challenging problem due to its complex dynamics. In this work, we introduce a novel learning-based framework of autoregressive motion planner (ARMP) for quadruped locomotion and navigation. Our method can generate motion plans with an arbitrary length in an autoregressive fashion, unlike most offline trajectory optimization algorithms for a fixed trajectory length. To this end, we first construct the motion library by solving a dense set of trajectory optimization problems for diverse scenarios and parameter settings. Then we learn the motion manifold from the dataset in a supervised learning fashion. We show that the proposed ARMP can generate physically plausible motions for various tasks and situations. We also showcase that our method can be successfully integrated with the recent robot navigation frameworks as a low-level controller and unleash the full capability of legged robots for complex indoor navigation.
In Sync: Exploring Synchronization to Increase Trust Between Humans and Non-humanoid Robots
Authors: Wieslaw Bartkowski (University of Warsaw), Andrzej Nowak (University of Warsaw), Filip Ignacy Czajkowski (University of Warsaw), Albrecht Schmidt (LMU Munich), Florian Müller (LMU Munich)
Abstract
When we go for a walk with friends, we can observe an interesting effect: From step lengths to arm movements - our movements unconsciously align; they synchronize. Prior research found that this synchronization is a crucial aspect of human relations that strengthens social cohesion and trust. Generalizing from these findings in synchronization theory, we propose a dynamical approach that can be applied in the design of non-humanoid robots to increase trust. We contribute the results of a controlled experiment with 51 participants exploring our concept in a between-subjects design. For this, we built a prototype of a simple non-humanoid robot that can bend to follow human movements and vary the movement synchronization patterns. We found that synchronized movements lead to significantly higher ratings in an established questionnaire on trust between people and automation but did not influence the willingness to spend money in a trust game.
Unbiasing Hamiltonian Monte Carlo algorithms for a general Hamiltonian function
Authors: Tony Lelièvre, Régis Santet, Gabriel Stoltz
Abstract
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method that allows to sample high dimensional probability measures. It relies on the integration of the Hamiltonian dynamics to propose a move which is then accepted or rejected thanks to a Metropolis procedure. Unbiased sampling is guaranteed by the preservation by the numerical integrators of two key properties of the Hamiltonian dynamics: volume-preservation and reversibility up to momentum reversal. For separable Hamiltonian functions, some standard explicit numerical schemes, such as the St\"ormer--Verlet integrator, satisfy these properties. However, for numerical or physical reasons, one may consider a Hamiltonian function which is nonseparable, in which case the standard numerical schemes which preserve the volume and satisfy reversibility up to momentum reversal are implicit. Actually, when implemented in practice, such implicit schemes may admit many solutions or none, especially when the timestep is too large. We show here how to enforce the numerical reversibility, and thus unbiasedness, of HMC schemes in this context. Numerical results illustrate the relevance of this correction on simple problems.
A source separation approach to temporal graph modelling for computer networks
Authors: Corentin Larroche
Subjects: Cryptography and Security (cs.CR); Applications (stat.AP); Machine Learning (stat.ML)
Abstract
Detecting malicious activity within an enterprise computer network can be framed as a temporal link prediction task: given a sequence of graphs representing communications between hosts over time, the goal is to predict which edges should--or should not--occur in the future. However, standard temporal link prediction algorithms are ill-suited for computer network monitoring as they do not take account of the peculiar short-term dynamics of computer network activity, which exhibits sharp seasonal variations. In order to build a better model, we propose a source separation-inspired description of computer network activity: at each time step, the observed graph is a mixture of subgraphs representing various sources of activity, and short-term dynamics result from changes in the mixing coefficients. Both qualitative and quantitative experiments demonstrate the validity of our approach.
TraffNet: Learning Causality of Traffic Generation for Road Network Digital Twins
Authors: Ming Xu, Yunyi Ma, Ruimin Li, Geqi Qi, Xiangfu Meng, Haibo Jin
Abstract
Road network digital twins (RNDTs) play a critical role in the development of next-generation intelligent transportation systems, enabling more precise traffic planning and control. To support just-in-time (JIT) decision making, RNDTs require a model that dynamically learns the traffic patterns from online sensor data and generates high-fidelity simulation results. Although current traffic prediction techniques based on graph neural networks have achieved state-of-the-art performance, these techniques only predict future traffic by mining correlations in historical traffic data, disregarding the causes of traffic generation, such as traffic demands and route selection. Therefore, their performance is unreliable for JIT decision making. To fill this gap, we introduce a novel deep learning framework called TraffNet that learns the causality of traffic volume from vehicle trajectory data. First, we use a heterogeneous graph to represent the road network, allowing the model to incorporate causal features of traffic volumes. Next, motivated by the traffic domain knowledge, we propose a traffic causality learning method to learn an embedding vector that encodes travel demands and path-level dependencies for each road segment. Then, we model temporal dependencies to match the underlying process of traffic generation. Finally, the experiments verify the utility of TraffNet. The code of TraffNet is available at https://github.com/mayunyi-1999/TraffNet_code.git.
Evolutionary Design of the Memory Subsystem
Authors: Josefa Díaz Álvarez, José L. Risco-Martín, J. Manuel Colmenar
Abstract
The memory hierarchy has a high impact on the performance and power consumption in the system. Moreover, current embedded systems, included in mobile devices, are specifically designed to run multimedia applications, which are memory intensive. This increases the pressure on the memory subsystem and affects the performance and energy consumption. In this regard, the thermal problems, performance degradation and high energy consumption, can cause irreversible damage to the devices. We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. Firstly, the thermal impact of register file is analyzed and optimized. Secondly, the cache memory is addressed by optimizing cache configuration according to running applications and improving both performance and power consumption. Finally, we simplify the design and evaluation process of general-purpose and customized dynamic memory manager, in the main memory. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. This way, we are able to evaluate the quality of each candidate solution and take advantage of the exploration of solutions given by the optimization algorithm.We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
Invariant preservation in machine learned PDE solvers via error correction
Abstract
Machine learned partial differential equation (PDE) solvers trade the reliability of standard numerical methods for potential gains in accuracy and/or speed. The only way for a solver to guarantee that it outputs the exact solution is to use a convergent method in the limit that the grid spacing $\Delta x$ and timestep $\Delta t$ approach zero. Machine learned solvers, which learn to update the solution at large $\Delta x$ and/or $\Delta t$, can never guarantee perfect accuracy. Some amount of error is inevitable, so the question becomes: how do we constrain machine learned solvers to give us the sorts of errors that we are willing to tolerate? In this paper, we design more reliable machine learned PDE solvers by preserving discrete analogues of the continuous invariants of the underlying PDE. Examples of such invariants include conservation of mass, conservation of energy, the second law of thermodynamics, and/or non-negative density. Our key insight is simple: to preserve invariants, at each timestep apply an error-correcting algorithm to the update rule. Though this strategy is different from how standard solvers preserve invariants, it is necessary to retain the flexibility that allows machine learned solvers to be accurate at large $\Delta x$ and/or $\Delta t$. This strategy can be applied to any autoregressive solver for any time-dependent PDE in arbitrary geometries with arbitrary boundary conditions. Although this strategy is very general, the specific error-correcting algorithms need to be tailored to the invariants of the underlying equations as well as to the solution representation and time-stepping scheme of the solver. The error-correcting algorithms we introduce have two key properties. First, by preserving the right invariants they guarantee numerical stability. Second, in closed or periodic systems they do so without degrading the accuracy of an already-accurate solver.
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Authors: Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit the effectiveness of this relation modeling. In this paper, we propose to select actor-related scene context, rather than directly leverage raw video scenario, to improve relation modeling. We develop a Cycle Actor-Context Relation network (CycleACR) where there is a symmetric graph that models the actor and context relations in a bidirectional form. Our CycleACR consists of the Actor-to-Context Reorganization (A2C-R) that collects actor features for context feature reorganizations, and the Context-to-Actor Enhancement (C2A-E) that dynamically utilizes reorganized context features for actor feature enhancement. Compared to existing designs that focus on C2A-E, our CycleACR introduces A2C-R for a more effective relation modeling. This modeling advances our CycleACR to achieve state-of-the-art performance on two popular action detection datasets (i.e., AVA and UCF101-24). We also provide ablation studies and visualizations as well to show how our cycle actor-context relation modeling improves video action detection. Code is available at https://github.com/MCG-NJU/CycleACR.
Dias: Dynamic Rewriting of Pandas Code
Authors: Stefanos Baziotis, Daniel Kang, Charith Mendis
Abstract
In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57$\times$ faster compared to pandas and 1909$\times$ faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6$\times$ compared to pandas and 26.4$\times$ compared to modin.
Reactive Gait Composition with Stability: Dynamic Walking amidst Static and Moving Obstacles
Abstract
This paper presents a modular approach to motion planning with provable stability guarantees for robots that move through changing environments via periodic locomotion behaviors. We focus on dynamic walkers as a paradigm for such systems, although the tools developed in this paper can be used to support general compositional approaches to robot motion planning with Dynamic Movement Primitives (DMPs). Our approach ensures a priori that the suggested plan can be stably executed. This is achieved by formulating the planning process as a Switching System with Multiple Equilibria (SSME) and proving that the system's evolution remains within explicitly characterized trapping regions in the state space under suitable constraints on the frequency of switching among the DMPs. These conditions effectively encapsulate the low-level stability limitations in a form that can be easily communicated to the planner to guarantee that the suggested plan is compatible with the robot's dynamics. Furthermore, we show how the available primitives can be safely composed online in a receding horizon manner to enable the robot to react to moving obstacles. The proposed framework is applied on 3D bipedal walking models under common modeling assumptions, and offers a modular approach towards stably integrating readily available low-level locomotion control and high-level planning methods.
Control Barrier Function-based Predictive Control for Close Proximity operation of UAVs inside a Tunnel
Abstract
This paper introduces a method for effectively controlling the movement of an Unmanned Aerial Vehicle (UAV) within a tunnel. The primary challenge of this problem lies in the UAV's exposure to nonlinear distance-dependent torques and forces generated by the tunnel walls, along with the need to operate safely within a defined region while in close proximity to these walls. To address this problem, the paper proposes the implementation of a Model Predictive Control (MPC) framework with constraints based on Control Barrier Function (CBF). The paper approaches the issue in two distinct ways; first, by maintaining a safe distance from the tunnel walls to avoid the effects of both the walls and ceiling, and second, by minimizing the distance from the walls to effectively manage the nonlinear forces associated with close proximity tasks. Finally, the paper demonstrates the effectiveness of its approach through testing on simulation for various close proximity trajectories with the realistic model of aerodynamic disturbances due to the proximity of the ceiling and boundary walls.
When to be critical? Performance and evolvability in different regimes of neural Ising agents
Authors: Sina Khajehabdollahi, Jan Prosi, Georg Martius, Anna Levina
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity.
Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction
Authors: Vitus Benson, Christian Requena-Mesa, Claire Robin, Lazaro Alonso, José Cortés, Zhihan Gao, Nora Linscheid, Mélanie Weynants, Markus Reichstein
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
We present a novel approach for modeling vegetation response to weather in Europe as measured by the Sentinel 2 satellite. Existing satellite imagery forecasting approaches focus on photorealistic quality of the multispectral images, while derived vegetation dynamics have not yet received as much attention. We leverage both spatial and temporal context by extending state-of-the-art video prediction methods with weather guidance. We extend the EarthNet2021 dataset to be suitable for vegetation modeling by introducing a learned cloud mask and an appropriate evaluation scheme. Qualitative and quantitative experiments demonstrate superior performance of our approach over a wide variety of baseline methods, including leading approaches to satellite imagery forecasting. Additionally, we show how our modeled vegetation dynamics can be leveraged in a downstream task: inferring gross primary productivity for carbon monitoring. To the best of our knowledge, this work presents the first models for continental-scale vegetation modeling at fine resolution able to capture anomalies beyond the seasonal cycle, thereby paving the way for predictive assessments of vegetation status.
Keyword: efficient
Analytical Study and Efficient Evaluation of the Josephus Function
A Stochastic Method for Solving Time-Fractional Differential Equations
Uniform in time convergence of numerical schemes for stochastic differential equations via Strong Exponential stability: Euler methods, Split-Step and Tamed Schemes
Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football
Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
A Novel Neural Network Approach for Predicting the Arrival Time of Buses for Smart On-Demand Public Transit
Learning Harmonic Molecular Representations on Riemannian Manifold
A New Index based on Power Splitting Indices for Predicting Proper Time of Controlled Islanding
Randomized rounding algorithms for large scale unsplittable flow problems
Privacy-preserving machine learning for healthcare: open challenges and future perspectives
Core-Periphery Principle Guided Redesign of Self-Attention in Transformers
Learning Expressive Prompting With Residuals for Vision Transformers
Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Predicting Thermoelectric Power Factor of Bismuth Telluride During Laser Powder Bed Fusion Additive Manufacturing
DisWOT: Student Architecture Search for Distillation WithOut Training
Efficient Deep Learning of Robust, Adaptive Policies using Tube MPC-Guided Data Augmentation
Distributed Graph Embedding with Information-Oriented Random Walks
Design Space Exploration for PCM-based Photonic Memory
HISSbot: Sidewinding with a Soft Snake Robot
Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
A Generalized Ray Formulation For Wave-Optics Rendering
Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference
TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Automated wildlife image classification: An active learning tool for ecological applications
Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes
GAS: A Gaussian Mixture Distribution-Based Adaptive Sampling Method for PINNs
The Wyner Variational Autoencoder for Unsupervised Multi-Layer Wireless Fingerprinting
Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations
Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning
STMixer: A One-Stage Sparse Action Detector
Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation
Efficient Quality Diversity Optimization of 3D Buildings through 2D Pre-optimization
Mask-Free Video Instance Segmentation
When Brain-inspired AI Meets AGI
A source separation approach to temporal graph modelling for computer networks
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks
A Survey on Malware Detection with Graph Representation Learning
Understanding and Exploring the Whole Set of Good Sparse Generalized Additive Models
Simulation-based Inference for Model Parameterization on Analog Neuromorphic Hardware
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Efficient solutions to the relative pose of three calibrated cameras from four points using virtual correspondences
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures
Variational Distribution Learning for Unsupervised Text-to-Image Generation
Multimodal Manoeuvre and Trajectory Prediction for Autonomous Vehicles Using Transformer Networks
DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets
Dias: Dynamic Rewriting of Pandas Code
What Writing Assistants Can Learn from Programming IDEs
Learning Federated Visual Prompt in Null Space for MRI Reconstruction
VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis
Large-scale Training Data Search for Object Re-identification
Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection
When to be critical? Performance and evolvability in different regimes of neural Ising agents
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Keyword: faster
A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics
Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems
Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
A Generalized Ray Formulation For Wave-Optics Rendering
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
Clustered Federated Learning Architecture for Network Anomaly Detection in Large Scale Heterogeneous IoT Networks
Faster Deterministic Distributed MIS and Approximate Matching
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Neural Collapse Inspired Federated Learning with Non-iid Data
Lazy learning: a biologically-inspired plasticity rule for fast and energy efficient synaptic plasticity
DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets
Dias: Dynamic Rewriting of Pandas Code
Keyword: mobile
Beyond Accuracy: A Critical Review of Fairness in Machine Learning for Mobile and Wearable Computing
Overcoming Probabilistic Faults in Disoriented Linear Search
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes
4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images
Around-Body Interaction: Leveraging Limb Movements for Interacting in a Digitally Augmented Physical World
Ranking mobility and impact inequality in early academic careers
Inside-out Infrared Marker Tracking via Head Mounted Displays for Smart Robot Programming
Evolutionary Design of the Memory Subsystem
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures
Unleashing the Power of Edge-Cloud Generative AI in Mobile Networks: A Survey of AIGC Services
Keyword: pruning
Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
Randomly Initialized Subnetworks with Iterative Weight Recycling
Large-scale Training Data Search for Object Re-identification
Keyword: voxel
Multimodal and multicontrast image fusion via deep generative models
LinK: Linear Kernel for LiDAR-based 3D Perception
Keyword: lidar
4D Panoptic Segmentation as Invariant and Equivariant Field Prediction
LinK: Linear Kernel for LiDAR-based 3D Perception
Keyword: diffusion
An efficient method for the anisotropic diffusion equation in magnetic fields
A Stochastic Method for Solving Time-Fractional Differential Equations
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Ecosystem Graphs: The Social Footprint of Foundation Models
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
Structure Preserving Finite Volume Approximation of Cross-Diffusion Systems Coupled by a Free Interface
Accelerating exponential integrators to efficiently solve advection-diffusion-reaction equations
Visual Chain-of-Thought Diffusion Models
Your Diffusion Model is Secretly a Zero-Shot Classifier
Keyword: dynamic
A Heterogeneous Parallel Non-von Neumann Architecture System for Accurate and Efficient Machine Learning Molecular Dynamics
Sequential training of GANs against GAN-classifiers reveals correlated "knowledge gaps" present among independently trained GAN instances
OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
Multiphysics discovery with moving boundaries using Ensemble SINDy and Peridynamic Differential Operator
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
GNN-based physics solver for time-independent PDEs
Switched Moving Boundary Modeling of Phase Change Thermal Energy Storage Systems
Minimization of Sensor Activation in Discrete-Event Systems with Control Delays and Observation Delays
Cesno: Possibility of Creating a New Programming Language
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
A Novel Design for Advanced 5G Deployment Environments with Virtualized Resources at Vehicular and MEC Nodes
Obstacle Avoidance in Dynamic Environments via Tunnel-following MPC with Adaptive Guiding Vector Fields
Control Barrier Functions in Dynamic UAVs for Kinematic Obstacle Avoidance: A Collision Cone Approach
Satellite Dynamics Toolbox Library: a tool to model multi-body space systems for robust control synthesis and analysis
STMixer: A One-Stage Sparse Action Detector
On Optimal Synchronization of Diffusively Coupled Heterogeneous Van der Pol Oscillators
ARMP: Autoregressive Motion Planning for Quadruped Locomotion and Navigation in Complex Indoor Environments
In Sync: Exploring Synchronization to Increase Trust Between Humans and Non-humanoid Robots
Unbiasing Hamiltonian Monte Carlo algorithms for a general Hamiltonian function
A source separation approach to temporal graph modelling for computer networks
TraffNet: Learning Causality of Traffic Generation for Road Network Digital Twins
Evolutionary Design of the Memory Subsystem
Invariant preservation in machine learned PDE solvers via error correction
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
Dias: Dynamic Rewriting of Pandas Code
Reactive Gait Composition with Stability: Dynamic Walking amidst Static and Moving Obstacles
Control Barrier Function-based Predictive Control for Close Proximity operation of UAVs inside a Tunnel
When to be critical? Performance and evolvability in different regimes of neural Ising agents
Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction