Abstract
Neural networks (NNs) playing the role of controllers have demonstrated impressive empirical performances on challenging control problems. However, the potential adoption of NN controllers in real-life applications also gives rise to a growing concern over the safety of these neural-network controlled systems (NNCSs), especially when used in safety-critical applications. In this work, we present POLAR-Express, an efficient and precise formal reachability analysis tool for verifying the safety of NNCSs. POLAR-Express uses Taylor model arithmetic to propagate Taylor models (TMs) across a neural network layer-by-layer to compute an overapproximation of the neural-network function. It can be applied to analyze any feed-forward neural network with continuous activation functions. We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions. In addition, POLAR-Express provides parallel computation support for the layer-by-layer propagation of TMs, thus significantly improving the efficiency and scalability over its earlier prototype POLAR. Across the comparison with six other state-of-the-art tools on a diverse set of benchmarks, POLAR-Express achieves the best verification efficiency and tightness in the reachable set analysis.
Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models
Authors: Mohamed Karim Belaid, Dorra El Mekki, Maximilian Rabus, Eyke Hüllermeier
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract
With the rapid growth of data availability and usage, quantifying the added value of each training data point has become a crucial process in the field of artificial intelligence. The Shapley values have been recognized as an effective method for data valuation, enabling efficient training set summarization, acquisition, and outlier removal. In this paper, we introduce "STI-KNN", an innovative algorithm that calculates the exact pair-interaction Shapley values for KNN models in O(t n^2) time, which is a significant improvement over the O(2^n)$ time complexity of baseline methods. By using STI-KNN, we can efficiently and accurately evaluate the value of individual data points, leading to improved training outcomes and ultimately enhancing the effectiveness of artificial intelligence applications.
A greedy approach for increased vehicle utilization in ridesharing networks
Authors: Aqsa Ashraf Makhdomi, Iqra Altaf Gillani
Subjects: Data Structures and Algorithms (cs.DS); Computers and Society (cs.CY); Information Retrieval (cs.IR); Optimization and Control (math.OC)
Abstract
In recent years, ridesharing platforms have become a prominent mode of transportation for the residents of urban areas. As a fundamental problem, route recommendation for these platforms is vital for their sustenance. The works done in this direction have recommended routes with higher passenger demand. Despite the existing works, statistics have suggested that these services cause increased greenhouse emissions compared to private vehicles as they roam around in search of riders. This analysis provides finer details regarding the functionality of ridesharing systems and it reveals that in the face of their boom, they have not utilized the vehicle capacity efficiently. We propose to overcome the above limitations and recommend routes that will fetch multiple passengers simultaneously which will result in increased vehicle utilization and thereby decrease the effect of these systems on the environment. As route recommendation is NP-hard, we propose a k-hop-based sliding window approximation algorithm that reduces the search space from entire road network to a window. We further demonstrate that maximizing expected demand is submodular and greedy algorithms can be used to optimize our objective function within a window. We evaluate our proposed model on real-world datasets and experimental results demonstrate superior performance by our proposed model.
SEENN: Towards Temporal Spiking Early-Exit Neural Networks
Authors: Yuhang Li, Tamar Geller, Youngeun Kim, Priyadarshini Panda
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
Spiking Neural Networks (SNNs) have recently become more popular as a biologically plausible substitute for traditional Artificial Neural Networks (ANNs). SNNs are cost-efficient and deployment-friendly because they process input in both spatial and temporal manners using binary spikes. However, we observe that the information capacity in SNNs is affected by the number of timesteps, leading to an accuracy-efficiency tradeoff. In this work, we study a fine-grained adjustment of the number of timesteps in SNNs. Specifically, we treat the number of timesteps as a variable conditioned on different input samples to reduce redundant timesteps for certain data. We call our method Spiking Early-Exit Neural Networks (SEENNs). To determine the appropriate number of timesteps, we propose SEENN-I which uses a confidence score thresholding to filter out the uncertain predictions, and SEENN-II which determines the number of timesteps by reinforcement learning. Moreover, we demonstrate that SEENN is compatible with both the directly trained SNN and the ANN-SNN conversion. By dynamically adjusting the number of timesteps, our SEENN achieves a remarkable reduction in the average number of timesteps during inference. For example, our SEENN-II ResNet-19 can achieve 96.1% accuracy with an average of 1.08 timesteps on the CIFAR-10 test dataset.
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs
Authors: Giacomo Pedretti, John Moon, Pedro Bruel, Sergey Serebryakov, Ron M. Roth, Luca Buonanno, Tobias Ziegler, Cong Xu, Martin Foltin, Jim Ignowski, Catherine E. Graves
Abstract
Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption.
Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes
Abstract
We study the computational scalability of a Gaussian process (GP) framework for solving general nonlinear partial differential equations (PDEs). This framework transforms solving PDEs to solving quadratic optimization problem with nonlinear constraints. Its complexity bottleneck lies in computing with dense kernel matrices obtained from pointwise evaluations of the covariance kernel of the GP and its partial derivatives at collocation points. We present a sparse Cholesky factorization algorithm for such kernel matrices based on the near-sparsity of the Cholesky factor under a new ordering of Diracs and derivative measurements. We rigorously identify the sparsity pattern and quantify the exponentially convergent accuracy of the corresponding Vecchia approximation of the GP, which is optimal in the Kullback-Leibler divergence. This enables us to compute $\epsilon$-approximate inverse Cholesky factors of the kernel matrices with complexity $O(N\log^d(N/\epsilon))$ in space and $O(N\log^{2d}(N/\epsilon))$ in time. With the sparse factors, gradient-based optimization methods become scalable. Furthermore, we can use the oftentimes more efficient Gauss-Newton method, for which we apply the conjugate gradient algorithm with the sparse factor of a reduced kernel matrix as a preconditioner to solve the linear system. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs such as the nonlinear elliptic, Burgers, and Monge-Amp`ere equations. In summary, we provide a fast, scalable, and accurate method for solving general PDEs with GPs.
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning
Abstract
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD, a parallel and large-scale multilingual conversation dataset that we created by translating the English-only Schema-Guided Dialogue (SGD) dataset (Rastogi et al., 2020) into 105 other languages. XSGD contains approximately 330k utterances per language. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts. We also investigate two different classifiers: NLI-based and vanilla classifiers, and test cross-lingual capability enabled by the aligned prompts. We evaluate our model's cross-lingual generalization capabilities on two conversation tasks: slot-filling and intent classification. Our results demonstrate the strong and efficient modeling ability of NLI-based classifiers and the large cross-lingual transfer improvements achieved by our aligned prompts, particularly in few-shot settings.
Towards Deterministic Communications in 6G Networks: State of the Art, Open Challenges and the Way Forward
Authors: Gourav Prateek Sharma, Dhruvin Patel, Joachim Sachs, Marilet De Andrade, Janos Farkas, Janos Harmatos, Balazs Varga, Hans-Peter Bernhard, Raheeb Muzaffar, Mahin K. Atiq, Frank Duerr, Dietmar Bruckner, Edgardo Montesdeoca, Drissa Houatra, Hongwei Zhang, James Gross
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Over the last decade, society and industries are undergoing rapid digitization that is expected to lead to the evolution of the cyber-physical continuum. End-to-end deterministic communications infrastructure is the essential glue that will bridge the digital and physical worlds of the continuum. We describe the state of the art and open challenges with respect to contemporary deterministic communications and compute technologies: 3GPP 5G, IEEE Time-Sensitive Networking, IETF DetNet, OPC UA as well as edge computing. While these technologies represent significant technological advancements towards networking Cyber-Physical Systems (CPS), we argue in this paper that they rather represent a first generation of systems which are still limited in different dimensions. In contrast, realizing future deterministic communication systems requires, firstly, seamless convergence between these technologies and, secondly, scalability to support heterogeneous (time-varying requirements) arising from diverse CPS applications. In addition, future deterministic communication networks will have to provide such characteristics end-to-end, which for CPS refers to the entire communication and computation loop, from sensors to actuators. In this paper, we discuss the state of the art regarding the main challenges towards these goals: predictability, end-to-end technology integration, end-to-end security, and scalable vertical application interfacing. We then present our vision regarding viable approaches and technological enablers to overcome these four central challenges. Key approaches to leverage in that regard are 6G system evolutions, wireless friendly integration of 6G into TSN and DetNet, novel end-to-end security approaches, efficient edge-cloud integrations, data-driven approaches for stochastic characterization and prediction, as well as leveraging digital twins towards system awareness.
Integrated Access and Backhaul via Satellites
Authors: Zaid Abdullah, Steven Kisseleff, Eva Lagunas, Vu Nguyen Ha, Frank Zeppenfeldt, Symeon Chatzinotas
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
To allow flexible and cost-efficient network densification and deployment, the integrated access and backhaul (IAB) was recently standardized by the third generation partnership project (3GPP) as part of the fifth-generation new radio (5G-NR) networks. However, the current standardization only defines the IAB for the terrestrial domain, while non-terrestrial networks (NTNs) are yet to be considered for such standardization efforts. In this work, we motivate the use of IAB in NTNs, and we discuss the compatibility issues between the 3GPP specifications on IAB in 5G-NR and the satellite radio regulations. In addition, we identify the required adaptation from the 3GPP and/or satellite operators for realizing an NTN-enabled IAB operation. A case study is provided for a low earth orbit (LEO) satellite-enabled in-band IAB operation with orthogonal and non-orthogonal bandwidth allocation between access and backhauling, and under both time- and frequency-division duplex (TDD/FDD) transmission modes. Numerical results demonstrate the feasibility of IAB through satellites, and illustrate the superiority of FDD over TDD transmission. It is also shown that in the absence of precoding, non-orthogonal bandwidth allocation between the access and the backhaul can largely degrades the network throughput.
PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research
Authors: Jun Jet Tai, Jim Wong, Mauro Innocente, Nadjim Horri, James Brusey, Swee King Phang
Abstract
Unmanned aerial vehicles (UAVs) have numerous applications, but their efficient and optimal flight can be a challenge. Reinforcement Learning (RL) has emerged as a promising approach to address this challenge, yet there is no standardized library for testing and benchmarking RL algorithms on UAVs. In this paper, we introduce PyFlyt, a platform built on the Bullet physics engine with native Gymnasium API support. PyFlyt provides modular implementations of simple components, such as motors and lifting surfaces, allowing for the implementation of UAVs of arbitrary configurations. Additionally, PyFlyt includes various task definitions and multiple reward function settings for each vehicle type. We demonstrate the effectiveness of PyFlyt by training various RL agents for two UAV models: quadrotor and fixed-wing. Our findings highlight the effectiveness of RL in UAV control and planning, and further show that it is possible to train agents in sparse reward settings for UAVs. PyFlyt fills a gap in existing literature by providing a flexible and standardised platform for testing RL algorithms on UAVs. We believe that this will inspire more standardised research in this direction.
Universal Framework for Parametric Constrained Coding
Abstract
Constrained coding is a fundamental field in coding theory that tackles efficient communication through constrained channels. While channels with fixed constraints have a general optimal solution, there is increasing demand for parametric constraints that are dependent on the message length. Several works have tackled such parametric constraints through iterative algorithms, yet they require complex constructions specific to each constraint to guarantee convergence through monotonic progression. In this paper, we propose a universal framework for tackling any parametric constrained-channel problem through a novel simple iterative algorithm. By reducing an execution of this iterative algorithm to an acyclic graph traversal, we prove a surprising result that guarantees convergence with efficient average time complexity even without requiring any monotonic progression. We demonstrate the effectiveness of this universal framework by applying it to a variety of both local and global channel constraints. We begin by exploring the local constraints involving illegal substrings of variable length, where the universal construction essentially iteratively replaces forbidden windows. We apply this local algorithm to the minimal periodicity, minimal Hamming weight, local almost-balanced Hamming weight and the previously-unsolved minimal palindrome constraints. We then continue by exploring global constraints, and demonstrate the effectiveness of the proposed construction on the repeat-free encoding, reverse-complement encoding, and the open problem of global almost-balanced encoding. For reverse-complement, we also tackle a previously-unsolved version of the constraint that addresses overlapping windows. Overall, the proposed framework generates state-of-the-art constructions with significant ease while also enabling the simultaneous integration of multiple constraints for the first time.
Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks
Authors: Andrew Halterman, Philip A. Schrodt, Andreas Beger, Benjamin E. Bagozzi, Grace I. Scarborough
Abstract
Event data, or structured records of who did what to whom'' that are automatically extracted from text, is an important source of data for scholars of international politics. The high cost of developing new event datasets, especially using automated systems that rely on hand-built dictionaries, means that most researchers draw on large, pre-existing datasets such as ICEWS rather than developing tailor-made event datasets optimized for their specific research question. This paper describes abag of tricks'' for efficient, custom event data production, drawing on recent advances in natural language processing (NLP) that allow researchers to rapidly produce customized event datasets. The paper introduces techniques for training an event category classifier with active learning, identifying actors and the recipients of actions in text using large language models and standard machine learning classifiers and pretrained ``question-answering'' models from NLP, and resolving mentions of actors to their Wikipedia article to categorize them. We describe how these techniques produced the new POLECAT global event dataset that is intended to replace ICEWS, along with examples of how scholars can quickly produce smaller, custom event datasets. We publish example code and models to implement our new techniques.
A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos
Authors: Yang Liu, Luiz Gustavo Hafemann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Training data is a critical requirement for machine learning tasks, and labeled training data can be expensive to acquire, often requiring manual or semi-automated data collection pipelines. For tracking applications, the data collection involves drawing bounding boxes around the classes of interest on each frame, and associate detections of the same "instance" over frames. In a semi-automated data collection pipeline, this can be achieved by running a baseline detection and tracking algorithm, and relying on manual correction to add/remove/change bounding boxes on each frame, as well as resolving errors in the associations over frames (track switches). In this paper, we propose a data correction pipeline to generate ground-truth data more efficiently in this semi-automated scenario. Our method simplifies the trajectories from the tracking systems and let the annotator verify and correct the objects in the sampled keyframes. Once the objects in the keyframes are corrected, the bounding boxes in the other frames are obtained by interpolation. Our method achieves substantial reduction in the number of frames requiring manual correction. In the MOT dataset, it reduces the number of frames by 30x while maintaining a HOTA score of 89.61% . Moreover, it reduces the number of frames by a factor of 10x while achieving a HOTA score of 79.24% in the SoccerNet dataset, and 85.79% in the DanceTrack dataset. The project code and data are publicly released at https://github.com/foreverYoungGitHub/trajectory-simplify-benchmark.
Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)
Authors: Aniket Pramanik, Mathews Jacob
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching
Authors: Pedro Castro, Tae-Kyun Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand
Abstract
Test suite minimization (TSM) is typically used to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Though many TSM approaches exist, most of them rely on code coverage (white-box) or model-based features, which are not always available for test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. Though ATM achieves a better trade-off between effectiveness and efficiency than FAST-R, it suffers from scalability issues for large software systems as its execution time increases rapidly with test suite size. To address scalability, we propose LTM, a scalable and black-box similarity-based TSM approach based on language models. To support similarity measurement, we investigated three different pre-trained language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings of test code (Java test methods), on which we computed two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used for minimizing test suites, thus reducing minimization time. Experimental results showed that the best configuration of LTM (using UniXcoder with Cosine similarity) outperformed the best two configurations of ATM by achieving significantly higher fault detection rates (0.84 versus 0.81, on average) and, more importantly, running much faster (26.73 minutes versus 72.75 minutes, on average) than ATM, in terms of both preparation time (up to two orders of magnitude faster) and minimization time (one order of magnitude faster).
Adaptive Defective Area Identification in Material Surface Using Active Transfer Learning-based Level Set Estimation
Abstract
In material characterization, identifying defective areas on a material surface is fundamental. The conventional approach involves measuring the relevant physical properties point-by-point at the predetermined mesh grid points on the surface and determining the area at which the property does not reach the desired level. To identify defective areas more efficiently, we propose adaptive mapping methods in which measurement resources are used preferentially to detect the boundaries of defective areas. We interpret this problem as an active-learning (AL) of the level set estimation (LSE) problem. The goal of AL-based LSE is to determine the level set of the physical property function defined on the surface with as small number of measurements as possible. Furthermore, to handle the situations in which materials with similar specifications are repeatedly produced, we introduce a transfer learning approach so that the information of previously produced materials can be effectively utilized. As a proof-of-concept, we applied the proposed methods to the red-zone estimation problem of silicon wafers and demonstrated that we could identify the defective areas with significantly lower measurement costs than those of conventional methods.
An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees
Authors: Ling Zhang, Daniel Tabas, Baosen Zhang
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
In this paper, we consider the scenario-based two-stage stochastic DC optimal power flow (OPF) problem for optimal and reliable dispatch when the load is facing uncertainty. Although this problem is a linear program, it remains computationally challenging to solve due to the large number of scenarios needed to accurately represent the uncertainties. To mitigate the computational issues, many techniques have been proposed to approximate the second-stage decisions so they can dealt more efficiently. The challenge of finding good policies to approximate the second-stage decisions is that these solutions need to be feasible, which has been difficult to achieve with existing policies. To address these challenges, this paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way. A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints. Namely, we can design policies that are feed forward functions that only output feasible solutions. Simulation results on standard IEEE systems show that, compared to iterative solvers and the widely used affine policy, our proposed method not only learns solutions of good quality but also accelerates the computation by orders of magnitude.
Thematic context vector association based on event uncertainty for Twitter
Abstract
Keyword extraction is a crucial process in text mining. The extraction of keywords with respective contextual events in Twitter data is a big challenge. The challenging issues are mainly because of the informality in the language used. The use of misspelled words, acronyms, and ambiguous terms causes informality. The extraction of keywords with informal language in current systems is pattern based or event based. In this paper, contextual keywords are extracted using thematic events with the help of data association. The thematic context for events is identified using the uncertainty principle in the proposed system. The thematic contexts are weighed with the help of vectors called thematic context vectors which signifies the event as certain or uncertain. The system is tested on the Twitter COVID-19 dataset and proves to be effective. The system extracts event-specific thematic context vectors from the test dataset and ranks them. The extracted thematic context vectors are used for the clustering of contextual thematic vectors which improves the silhouette coefficient by 0.5% than state of art methods namely TF and TF-IDF. The thematic context vector can be used in other applications like Cyberbullying, sarcasm detection, figurative language detection, etc.
Optimizing Irrigation Efficiency using Deep Reinforcement Learning in the Field
Authors: Xianzhong Ding, Wan Du
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Agricultural irrigation is a significant contributor to freshwater consumption. However, the current irrigation systems used in the field are not efficient. They rely mainly on soil moisture sensors and the experience of growers, but do not account for future soil moisture loss. Predicting soil moisture loss is challenging because it is influenced by numerous factors, including soil texture, weather conditions, and plant characteristics. This paper proposes a solution to improve irrigation efficiency, which is called DRLIC. DRLIC is a sophisticated irrigation system that uses deep reinforcement learning (DRL) to optimize its performance. The system employs a neural network, known as the DRL control agent, which learns an optimal control policy that considers both the current soil moisture measurement and the future soil moisture loss. We introduce an irrigation reward function that enables our control agent to learn from previous experiences. However, there may be instances where the output of our DRL control agent is unsafe, such as irrigating too much or too little water. To avoid damaging the health of the plants, we implement a safety mechanism that employs a soil moisture predictor to estimate the performance of each action. If the predicted outcome is deemed unsafe, we perform a relatively-conservative action instead. To demonstrate the real-world application of our approach, we developed an irrigation system that comprises sprinklers, sensing and control nodes, and a wireless network. We evaluate the performance of DRLIC by deploying it in a testbed consisting of six almond trees. During a 15-day in-field experiment, we compared the water consumption of DRLIC with a widely-used irrigation scheme. Our results indicate that DRLIC outperformed the traditional irrigation method by achieving a water savings of up to 9.52%.
On the coordination efficiency of strategic multi-agent robotic teams
Authors: Marcos M. Vasconcelos, Behrouz Touri
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT); Multiagent Systems (cs.MA)
Abstract
We study the problem of achieving decentralized coordination by a group of strategic decision makers choosing to engage or not in a task in a stochastic setting. First, we define a class of symmetric utility games that encompass a broad class of coordination games, including the popular framework known as \textit{global games}. With the goal of studying the extent to which agents engaging in a stochastic coordination game indeed coordinate, we propose a new probabilistic measure of coordination efficiency. Then, we provide an universal information theoretic upper bound on the coordination efficiency as a function of the amount of noise in the observation channels. Finally, we revisit a large class of global games, and we illustrate that their Nash equilibrium policies may be less coordination efficient then certainty equivalent policies, despite of them providing better expected utility. This counter-intuitive result, establishes the existence of a nontrivial trade-offs between coordination efficiency and expected utility in coordination games.
Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
Authors: Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves. As MARL uses gradient-based optimization, learning anticipation requires using Higher-Order Gradients (HOG), with so-called HOG methods. Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents. Currently, however, these existing HOG methods have only been applied to differentiable games or games with small state spaces. In this work, we demonstrate that in the case of non-differentiable games with large state spaces, existing HOG methods do not perform well and are inefficient due to their inherent limitations related to policy parameter anticipation and multiple sampling stages. To overcome these problems, we propose Off-Policy Action Anticipation (OffPA2), a novel framework that approaches learning anticipation through action anticipation, i.e., agents anticipate the changes in actions of other agents, via off-policy sampling. We theoretically analyze our proposed OffPA2 and employ it to develop multiple HOG methods that are applicable to non-differentiable games with large state spaces. We conduct a large set of experiments and illustrate that our proposed HOG methods outperform the existing ones regarding efficiency and performance.
Signal Temporal Logic Meets Convex-Concave Programming: A Structure-Exploiting SQP Algorithm for STL Specifications
Abstract
This study considers the control problem with signal temporal logic (STL) specifications. Prior works have adopted smoothing techniques to address this problem within a feasible time frame and solve the problem by applying sequential quadratic programming (SQP) methods naively. However, one of the drawbacks of this approach is that solutions can easily become trapped in local minima that do not satisfy the specification. In this study, we propose a new optimization method, termed CCP-based SQP, based on the convex-concave procedure (CCP). Our framework includes a new robustness decomposition method that decomposes the robustness function into a set of constraints, resulting in a form of difference of convex (DC) program that can be solved efficiently. We solve this DC program sequentially as a quadratic program by only approximating the disjunctive parts of the specifications. Our experimental results demonstrate that our method has a superior performance compared to the state-of-the-art SQP methods in terms of both robustness and computational time.
Blockwise Compression of Transformer-based Models without Retraining
Authors: Gaochen Dong, Wei Chen
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Transformer-based models, represented by GPT-3, ChatGPT, and GPT-4, have recently attracted increasing interest, research enthusiasm, and business demand. However, their massive computation resources and huge memory footprint are inevitable challenges. To tackle this issue, we propose BCT, a framework of blockwise compression for transformers without retraining, to lower deployment thresholds. BCT achieves more fine-grained compression of the whole transformer, including embedding, matrix multiplication, GELU, Softmax, layer normalization, and all the intermediate results. As a case, we compress an efficient model with BCT and evaluate it on several General Language Understanding Evaluation (GLUE) datasets. The results show that BCT can achieve a less than 0.90% accuracy drop in most tasks.
OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting
Authors: Xiao He, Ye Li, Jian Tan, Bin Wu, Feifei Li
Abstract
Seasonal-trend decomposition is one of the most fundamental concepts in time series analysis that supports various downstream tasks, including time series anomaly detection and forecasting. However, existing decomposition methods rely on batch processing with a time complexity of O(W), where W is the number of data points within a time window. Therefore, they cannot always efficiently support real-time analysis that demands low processing delay. To address this challenge, we propose OneShotSTL, an efficient and accurate algorithm that can decompose time series online with an update time complexity of O(1). OneShotSTL is more than $1,000$ times faster than the batch methods, with accuracy comparable to the best counterparts. Extensive experiments on real-world benchmark datasets for downstream time series anomaly detection and forecasting tasks demonstrate that OneShotSTL is from 10 to over 1,000 times faster than the state-of-the-art methods, while still providing comparable or even better accuracy.
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation
Authors: Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, Qixing Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors. However, little research has been done to investigate how to incorporate additional supervision on the BEV features to improve proposal generation in the detector head, while still balancing the number of powerful 3D layers and efficient 2D network operations. This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D, which serves as a dense supervision signal for better BEV feature learning. The key idea is to use auxiliary networks to predict a combination of explicit and implicit semantic probabilities by exploiting their complementary properties. Extensive experiments show that our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors and consistently improves upon baseline models.
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.
How Regional Wind Characteristics Affect CNN-based wind predictions: Insights from Spatiotemporal Correlation Analysis
Authors: Heesoo Shin, Mario Rüttgers, Sangseung Lee
Abstract
This study investigates the impact of spatiotemporal data dimensions on the precision of a wind forecasting model developed using an artificial neural network. Although previous studies have shown that incorporating spatial data can enhance the accuracy of wind forecasting models, few investigations have explored the extent of the improvement owing to different spatial scales in neural network-based predictive models. Additionally, there are limited studies on the optimal temporal length of the input data for these models. To address this gap, this study employs data with various spatiotemporal dimensions as inputs when forecasting wind using 3D-Convolutional Neural Networks (3D-CNN) and assesses their predictive performance. The results indicate that using spatial data of the surrounding area for 3D-CNN training can achieve better predictive performance than using only single-point information. Additionally, multi-time data had a more positive effect on the predictive performance than single-time data. To determine the reasons for this, correlation analyses were used to determine the impact of the spatial and temporal sizes of the training data on the prediction performance. The study found that as the autocorrelation coefficient (ACC) decreased, meaning that there was less similarity over time, the prediction performance decreased. Furthermore, the spatial standard deviation of the ACC also affects the prediction performance. A Pearson correlation coefficient (PCC) analysis was conducted to examine the effect of space on the prediction performance. Through the PCC analysis, we show that local geometric and seasonal wind conditions can influence the forecast capability of a predictive model.
Meta-Learning with a Geometry-Adaptive Preconditioner
Abstract
Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient descent with a meta-learned preconditioner can be beneficial. Existing preconditioners, however, cannot simultaneously adapt in a task-specific and path-dependent way. Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. Thanks to the two properties, the geometry-adaptive preconditioner is effective for improving the inner-loop optimization. Experiment results show that GAP outperforms the state-of-the-art MAML family and preconditioned gradient descent-MAML (PGD-MAML) family in a variety of few-shot learning tasks. Code is available at: https://github.com/Suhyun777/CVPR23-GAP.
Information and Energy Transmission with Wavelet-Reconstructed Harvesting Functions
Abstract
In practical simultaneous information and energy transmission (SIET), the exact energy harvesting function is usually unavailable because an energy harvesting circuit is nonlinear and nonideal. In this work, we consider a SIET problem where the harvesting function is accessible only at experimentally-taken sample points and study how close we can design SIET to the optimal system with such sampled knowledge. Assuming that the harvesting function is of bounded variation that may have discontinuities, we separately consider two settings where samples are taken without and with additive noise. For these settings, we propose to design a SIET system as if a wavelet-reconstructed harvesting function is the true one and study its asymptotic performance loss of energy and information delivery from the true optimal one. Specifically, for noiseless samples, it is shown that designing SIET as if the wavelet-reconstructed harvesting function is the truth incurs asymptotically vanishing energy and information delivery loss with the number of samples. For noisy samples, we propose to reconstruct wavelet coefficients via soft-thresholding estimation. Then, we not only obtain similar asymptotic losses to the noiseless case but also show that the energy loss by wavelets is asymptotically optimal up to a logarithmic factor.
HALO: Hazard-Aware Landing Optimization for Autonomous Systems
Authors: Christopher R. Hayner, Samuel C. Buckner, Daniel Broyles, Evelyn Madewell, Karen Leung, Behcet Acikmese
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
With autonomous aerial vehicles enacting safety-critical missions, such as the Mars Science Laboratory Curiosity rover's landing on Mars, the tasks of automatically identifying and reasoning about potentially hazardous landing sites is paramount. This paper presents a coupled perception-planning solution which addresses the hazard detection, optimal landing trajectory generation, and contingency planning challenges encountered when landing in uncertain environments. Specifically, we develop and combine two novel algorithms, Hazard-Aware Landing Site Selection (HALSS) and Adaptive Deferred-Decision Trajectory Optimization (Adaptive-DDTO), to address the perception and planning challenges, respectively. The HALSS framework processes point cloud information to identify feasible safe landing zones, while Adaptive-DDTO is a multi-target contingency planner that adaptively replans as new perception information is received. We demonstrate the efficacy of our approach using a simulated Martian environment and show that our coupled perception-planning method achieves greater landing success whilst being more fuel efficient compared to a nonadaptive DDTO approach.
MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network
Authors: Dan Zhang, Fangfang Zhou, Yuwen Jiang, Zhengming Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent advances in deep learning have been pushing image denoising techniques to a new level. In self-supervised image denoising, blind-spot network (BSN) is one of the most common methods. However, most of the existing BSN algorithms use a dot-based central mask, which is recognized as inefficient for images with large-scale spatially correlated noise. In this paper, we give the definition of large-noise and propose a multi-mask strategy using multiple convolutional kernels masked in different shapes to further break the noise spatial correlation. Furthermore, we propose a novel self-supervised image denoising method that combines the multi-mask strategy with BSN (MM-BSN). We show that different masks can cause significant performance differences, and the proposed MM-BSN can efficiently fuse the features extracted by multi-masked layers, while recovering the texture structures destroyed by multi-masking and information transmission. Our MM-BSN can be used to address the problem of large-noise denoising, which cannot be efficiently handled by other BSN methods. Extensive experiments on public real-world datasets demonstrate that the proposed MM-BSN achieves state-of-the-art performance among self-supervised and even unpaired image denoising methods for sRGB images denoising, without any labelling effort or prior knowledge. Code can be found in https://github.com/dannie125/MM-BSN.
An interpretability framework for Similar case matching
Abstract
Similar Case Matching (SCM) is designed to determine whether two cases are similar. The task has an essential role in the legal system, helping legal professionals to find relevant cases quickly and thus deal with them more efficiently. Existing research has focused on improving the model's performance but not on its interpretability. Therefore, this paper proposes a pipeline framework for interpretable SCM, which consists of four modules: a judicial feature sentence identification module, a case matching module, a feature sentence alignment module, and a conflict disambiguation module. Unlike existing SCM methods, our framework will identify feature sentences in a case that contain essential information, perform similar case matching based on the extracted feature sentence results, and align the feature sentences in the two cases to provide evidence for the similarity of the cases. SCM results may conflict with feature sentence alignment results, and our framework further disambiguates against this inconsistency. The experimental results show the effectiveness of our framework, and our work provides a new benchmark for interpretable SCM.
On a family of low-rank algorithms for large-scale algebraic Riccati equations
Abstract
In [3] it was shown that four seemingly different algorithms for computing low-rank approximate solutions $X_j$ to the solution $X$ of large-scale continuous-time algebraic Riccati equations (CAREs) $0 = \mathcal{R}(X) := A^HX+XA+C^HC-XBB^HX $ generate the same sequence $X_j$ when used with the same parameters. The Hermitian low-rank approximations $X_j$ are of the form $X_j = Z_jY_jZ_j^H,$ where $Z_j$ is a matrix with only few columns and $Y_j$ is a small square Hermitian matrix. Each $X_j$ generates a low-rank Riccati residual $\mathcal{R}(X_j)$ such that the norm of the residual can be evaluated easily allowing for an efficient termination criterion. Here a new family of methods to generate such low-rank approximate solutions $X_j$ of CAREs is proposed. Each member of this family of algorithms proposed generates the same sequence of $X_j$ as the four previously known algorithms. The approach is based on a block rational Arnoldi decomposition and an associated block rational Krylov subspace spanned by $A^H$ and $C^H.$ Two specific versions of the general algorithm will be considered; one will turn out to be equivalent to the RADI algorithm, the other one allows for a slightly more efficient implementation compared to the RADI algorithm. Moreover, our approach allows for adding more than one shift at a time.
Equivariant Networks for Porous Crystalline Materials
Authors: Marko Petković, Pablo Romero-Marimon, Vlado Menkovski, Sofia Calero
Abstract
Efficiently predicting properties of porous crystalline materials has great potential to accelerate the high throughput screening process for developing new materials, as simulations carried out using first principles model are often computationally expensive. To effectively make use of Deep Learning methods to model these materials, we need to utilize the symmetries present in the crystals, which are defined by their space group. Existing methods for crystal property prediction either have symmetry constraints that are too restrictive or only incorporate symmetries between unit cells. In addition, these models do not explicitly model the porous structure of the crystal. In this paper, we develop a model which incorporates the symmetries of the unit cell of a crystal in its architecture and explicitly models the porous structure. We evaluate our model by predicting the heat of adsorption of CO$_2$ for different configurations of the mordenite zeolite. Our results confirm that our method performs better than existing methods for crystal property prediction and that the inclusion of pores results in a more efficient model.
Moving Obstacle Collision Avoidance via Chance-Constrained MPC with CBF
Authors: Ming Li, Zhiyong Sun, Zirui Liao, Siep Weiland
Abstract
Model predictive control (MPC) with control barrier functions (CBF) is a promising solution to address the moving obstacle collision avoidance (MOCA) problem. Unlike MPC with distance constraints (MPC-DC), this approach facilitates early obstacle avoidance without the need to increase prediction horizons. However, the existing MPC-CBF method is deterministic and fails to account for perception uncertainties. This paper proposes a generalized MPC-CBF approach for stochastic scenarios, which maintains the advantages of the deterministic method for addressing the MOCA problem. Specifically, the chance-constrained MPC-CBF (CC-MPC-CBF) technique is introduced to ensure that a user-defined collision avoidance probability is met by utilizing probabilistic CBFs. However, due to the potential empty intersection between the reachable set and the safe region confined by CBF constraints, the CC-MPC-CBF problem can pose challenges in achieving feasibility. To address this issue, we propose a sequential implementation approach that involves solving a standard MPC optimization problem followed by a predictive safety filter optimization, which leads to improved feasibility. Furthermore, we introduce an iterative convex optimization scheme to further expedite the resolution of the predictive safety filter, which results in an efficient approach to tackling the non-convex CC-MPC-CBF problem. We apply our proposed algorithm to a 2-D integrator system for MOCA, and we showcase its resilience to obstacle measurement uncertainties and favorable feasibility properties.
Adaptive Image Compression via Optimal Mesh Refinement
Abstract
The JPEG algorithm is a defacto standard for image compression. We investigate whether adaptive mesh refinement can be used to optimize the compression ratio and propose a new adaptive image compression algorithm. We prove that it produces a quasi-optimal subdivision grid for a given error norm with high probability. This subdivision can be stored with very little overhead and thus leads to an efficient compression algorithm. We demonstrate experimentally, that the new algorithm can achieve better compression ratios than standard JPEG compression with no visible loss of quality on many images. The mathematical core of this work shows that Binev's optimal tree approximation algorithm is applicable to image compression with high probability, when we assume small additive Gaussian noise on the pixels of the image.
Controller Synthesis for Local and Global Specifications in Multi-Agent Systems
Authors: David Smith Sundarsingh, Jay Bhagiya, Saharsh, Jeel Chatrola, Adnane Saoud, Pushpak Jagtap
Abstract
In this paper, we propose a computationally efficient symbolic controller synthesis technique for multi-agent systems. The paper focuses on synthesizing distributed controllers enforcing local temporal logic specifications along with global safety specifications for multi-agent systems. To solve the problem in a computationally efficient way we leverage the concept of control barrier functions. In particular, we use a three-step bottom-up approach: first, the symbolic controllers for individual agents are synthesized to enforce local temporal logic specifications, then we use a notion of control barrier functions for symbolic models to compose controlled agent systems by removing unsafe transitions, and finally, we synthesize controller for the reduced composed system to ensure the satisfaction of local temporal logic specifications while ensuring global safety specification. The effectiveness of our approach is demonstrated on a multi-robot system by comparing it with the conventional monolithic symbolic control approach.
High-performance Time Series Anomaly Discovery on Graphics Processors
Authors: Mikhail Zymbler, Yana Kraeva
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Currently, discovering subsequence anomalies in time series remains one of the most topical research problems. A subsequence anomaly refers to successive points in time that are collectively abnormal, although each point is not necessarily an outlier. Among a large number of approaches to discovering subsequence anomalies, the discord concept is considered one of the best. A time series discord is intuitively defined as a subsequence of a given length that is maximally far away from its non-overlapping nearest neighbor. Recently introduced the MERLIN algorithm discovers time series discords of every possible length in a specified range, thereby eliminating the need to set even that sole parameter to discover discords in a time series. However, MERLIN is serial and its parallelization could increase the performance of discords discovery. In this article, we introduce a novel parallelization scheme for GPUs, called PALMAD, Parallel Arbitrary Length MERLIN-based Anomaly Discovery. As opposed to its serial predecessor, PALMAD employs recurrent formulas we have derived to avoid redundant calculations, and advanced data structures for the efficient implementation of parallel processing. Experimental evaluation over real-world and synthetic time series shows that our algorithm outperforms parallel analogs. We also apply PALMAD to discover anomalies in a real-world time series employing our proposed discord heatmap technique to illustrate the results.
Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines
Abstract
The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently on Systolic Arrays (SA). To effectively trade off deep-learning training/inference quality with hardware cost, SA accelerators employ reduced-precision Floating-Point (FP) arithmetic. In this work, we demonstrate the need for new pipeline organizations to reduce latency and improve energy efficiency of reduced-precision FP operators for the chained multiply-add operation imposed by the structure of the SA. The proposed skewed pipeline design reorganizes the pipelined operation of the FP multiply-add units to enable new forwarding paths for the exponent logic, which allow for parallel execution of the pipeline stages of consecutive PEs. As a result, the latency of the matrix multiplication operation within the SA is significantly reduced with minimal hardware cost, thereby yielding an energy reduction of 8% and 11% for the examined state-of-the-art CNNs.
Comparison of Two Search Criteria for Lattice-based Kernel Approximation
Authors: Frances Y. Kuo, Weiwen Mo, Dirk Nuyens, Ian H. Sloan, Abirami Srikumar
Abstract
The kernel interpolant in a reproducing kernel Hilbert space is optimal in the worst-case sense among all approximations of a function using the same set of function values. In this paper, we compare two search criteria to construct lattice point sets for use in lattice-based kernel approximation. The first candidate, $\calP_n^$, is based on the power function that appears in machine learning literature. The second, $\calS_n^$, is a search criterion used for generating lattices for approximation using truncated Fourier series. We find that the empirical difference in error between the lattices constructed using $\calP_n^$ and $\calS_n^$ is marginal. The criterion $\calS_n^*$ is preferred as it is computationally more efficient and has a proven error bound.
Towards Open-Vocabulary Video Instance Segmentation
Authors: Haochen Wang, Shuai Wang, Cilin Yan, Xiaolong Jiang, XU Tang, Yao Hu, Weidi Xie, Efstratios Gavves
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Video Instance Segmentation(VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset(LV-VIS), that contains well-annotated objects from 1,212 diverse categories, significantly surpassing the category size of existing datasets by more than one order of magnitude. Third, we propose an efficient Memory-Induced Vision-Language Transformer, MindVLT, to first achieve Open-Vocabulary VIS in an end-to-end manner with near real-time inference speed. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of MindVLT on novel categories. We will release the dataset and code to facilitate future endeavors.
Virtio-FPGA: a virtualization solution for SoC-attached FPGAs
Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho
Abstract
Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualization of FPGA resources aims to reinforce these systems with strong isolation, consolidation and security. In this paper, we present a novel virtualization framework aimed for SoC-attached FPGA devices, in a Linux and QEMU/KVM setup. We use Virtio as a means to enable the configuration of FPGA resources from guest systems in an efficient way. Also, we employ the Linux VFIO and Device Tree Overlays technologies in order to render the FPGA resources dynamically accessible to guest systems. The ability to dynamically configure and utilize the FPGA resources from a virtualization environment is described in details. The evaluation procedure of the solution is presented and the virtualization overhead is benchmarked as minimal (around 10%) when accessing the FPGA devices from guest systems.
Learning quantities of interest from parametric PDEs: An efficient neural-weighted Minimal Residual approach
Authors: Ignacio Brevis, Ignacio Muga, David Pardo, Oscar Rodríguez, Kristoffer G. van der Zee
Abstract
The efficient approximation of parametric PDEs is of tremendous importance in science and engineering. In this paper, we show how one can train Galerkin discretizations to efficiently learn quantities of interest of solutions to a parametric PDE. The central component in our approach is an efficient neural-network-weighted Minimal-Residual formulation, which, after training, provides Galerkin-based approximations in standard discrete spaces that have accurate quantities of interest, regardless of the coarseness of the discrete space.
Black Box Few-Shot Adaptation for Vision-Language models
Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a black-box method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni-modal models. To achieve this, we propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets.
Abstract
One method to compute multiple precision integer quotients is to use a Newton iteration with multiple precision fixed point or floating point values. On one hand, this allows quotients to be calculated efficiently by employing an efficient multiplication method. On the other hand, this leads to a library structure where exact and approximate arithmetic are interdependent. This paper develops the concept of a shifted inverse and modified Newton iteration to compute quotients efficiently using whole numbers only. The method is equally applicable to computing polynomial quotients efficiently.
Incorporating Unlabelled Data into Bayesian Neural Networks
Authors: Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin
Abstract
We develop a contrastive framework for learning better prior distributions for Bayesian Neural Networks (BNNs) using unlabelled data. With this framework, we propose a practical BNN algorithm that offers the label-efficiency of self-supervised learning and the principled uncertainty estimates of Bayesian methods. Finally, we demonstrate the advantages of our approach for data-efficient learning in semi-supervised and low-budget active learning problems.
Neural Field Convolutions by Repeated Differentiation
Abstract
Neural fields are evolving towards a general-purpose continuous representation for visual computing. Yet, despite their numerous appealing properties, they are hardly amenable to signal processing. As a remedy, we present a method to perform general continuous convolutions with general continuous signals such as neural fields. Observing that piecewise polynomial kernels reduce to a sparse set of Dirac deltas after repeated differentiation, we leverage convolution identities and train a repeated integral field to efficiently execute large-scale convolutions. We demonstrate our approach on a variety of data modalities and spatially-varying kernels.
FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks
Abstract
In this work, we investigate the challenging problem of on-demand semantic communication over heterogeneous wireless networks. We propose a fidelity-adjustable semantic transmission framework (FAST) that empowers wireless devices to send data efficiently under different application scenarios and resource conditions. To this end, we first design a dynamic sub-model training scheme to learn the flexible semantic model, which enables edge devices to customize the transmission fidelity with different widths of the semantic model. After that, we focus on the FAST optimization problem to minimize the system energy consumption with latency and fidelity constraints. Following that, the optimal transmission strategies including the scaling factor of the semantic model, computing frequency, and transmitting power are derived for the devices. Experiment results indicate that, when compared to the baseline transmission schemes, the proposed framework can reduce up to one order of magnitude of the system energy consumption and data size for maintaining reasonable data fidelity.
Incremental Verification of Neural Networks
Authors: Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, Gagandeep Singh
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
Abstract
Complete verification of deep neural networks (DNNs) can exactly determine whether the DNN satisfies a desired trustworthy property (e.g., robustness, fairness) on an infinite set of inputs or not. Despite the tremendous progress to improve the scalability of complete verifiers over the years on individual DNNs, they are inherently inefficient when a deployed DNN is updated to improve its inference speed or accuracy. The inefficiency is because the expensive verifier needs to be run from scratch on the updated DNN. To improve efficiency, we propose a new, general framework for incremental and complete DNN verification based on the design of novel theory, data structure, and algorithms. Our contributions implemented in a tool named IVAN yield an overall geometric mean speedup of 2.4x for verifying challenging MNIST and CIFAR10 classifiers and a geometric mean speedup of 3.8x for the ACAS-XU classifiers over the state-of-the-art baselines.
Geometric Particle-In-Cell discretizations of a plasma hybrid model with kinetic ions and mass-less fluid electrons
Authors: Yingzhe Li, Martin Campos Pinto, Florian Holderied, Stefan Possanner, Eric Sonnendrücker
Abstract
We explore the possibilities of applying structure-preserving numerical methods to a plasma hybrid model with kinetic ions and mass-less fluid electrons satisfying the quasi-neutrality relation. The numerical schemes are derived by finite element methods in the framework of finite element exterior calculus (FEEC) for field variables, particle-in-cell (PIC) methods for the Vlasov equation, and splitting methods in time based on an anti-symmetric bracket proposed. Conservation properties of energy, quasi-neutrality relation, positivity of density, and divergence-free property of the magnetic field are given irrespective of the used resolution and metric. Local quasi-interpolation is used for dealing with the current terms in order to make the proposed methods more efficient. The implementation has been done in the framework of the Python package STRUPHY [1], and has been verified by extensive numerical experiments.
Uncertainty Quantification for Recursive Estimation in Adaptive Safety-Critical Control
Authors: Max H. Cohen, Makai Mann, Kevin Leahy, Calin Belta
Abstract
In this paper, we present a framework for online parameter estimation and uncertainty quantification in the context of adaptive safety-critical control. First, we demonstrate how incorporating a history stack of data into the classic recursive least squares algorithm facilitates parameter convergence under relaxed excitation conditions. Our key observation is that the estimate generated by this algorithm at any point in time is an affine transformation of the initial estimate. This property allows for parameterizing the uncertainty associated with such estimates using objects that are closed under affine transformation, such as zonotopes and Gaussian distributions, and enables the efficient propagation of such uncertainty metrics along the trajectory of the parameter estimates. We illustrate how such an approach facilitates the synthesis of safety-critical controllers for systems with parametric uncertainty using control barrier functions. Finally, we demonstrate the advantages of online adaptation and uncertainty quantification via numerical examples.
Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python
Abstract
The $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $\texttt{ChoiceDataset}$ from databases of various formats and functionalities of $\texttt{ChoiceDataset}$. The package implements two widely used models, namely the multinomial logit and nested logit models, and supports regularization during model estimation. The package incorporates the option to take advantage of GPUs for estimation, allowing it to scale to massive datasets while being computationally efficient. Models can be initialized using either R-style formula strings or Python dictionaries. We conclude with a comparison of the computational efficiencies of $\texttt{torch-choice}$ and $\texttt{mlogit}$ in R as (1) the number of observations increases, (2) the number of covariates increases, and (3) the expansion of item sets. Finally, we demonstrate the scalability of $\texttt{torch-choice}$ on large-scale datasets.
Inverting the SerDes Link Design Flow Process
Authors: Michael J. Degerstrom, Chad M. Smutzer, Patrick J. Zabinski, Barry K. Gilbert
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
The traditional SerDes link simulation process begins with the extraction of printed circuit board (PCB) physical stripline and via models, followed by channel modeling and link simulation. We invert this simulation flow by first creating link performance curves across an array of hypothetical channels defined with specially-developed, high level, equation-based models; limited physical extraction is later undertaken to relate PCB channel implementation to these performance curves. These curves allow us to determine the system-level SerDes channel requirements and to become better informed in choosing PCB technologies for lower cost and easier manufacturability. The inverted modeling process is very efficient, allowing for the rapid identification and avoidance of problematic channel topologies and the study of other potentially useful channel designs.
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
Authors: Omar Erak, Hatem Abou-Zeid
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning
Abstract
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters. While these methods have shown promise, inconsistencies in experimental conditions make it difficult to disentangle their advantage from other experimental factors including the feature extractor architecture, pre-trained initialization and fine-tuning algorithm, amongst others. In our paper, we conduct a large-scale, experimentally consistent, empirical analysis to study PEFTs for few-shot image classification. Through a battery of over 1.8k controlled experiments on large-scale few-shot benchmarks including Meta-Dataset (MD) and ORBIT, we uncover novel insights on PEFTs that cast light on their efficacy in fine-tuning ViTs for few-shot classification. Through our controlled empirical study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters (which we call LN-Tune) during few-shot adaptation is an extremely strong baseline across ViTs pre-trained with both self-supervised and supervised objectives, (ii) For self-supervised ViTs, we find that simply learning a set of scaling parameters for each attention matrix (which we call AttnScale) along with a domain-residual adapter (DRA) module leads to state-of-the-art performance (while being $\sim!$ 9$\times$ more parameter-efficient) on MD. Our extensive empirical findings set strong baselines and call for rethinking the current design of PEFT methods for FSC.
High-Throughput Vector Similarity Search in Knowledge Graphs
Authors: Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, Theodoros Rekatsinas
Abstract
There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31x improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Authors: Zhiqiang Hu, Yihuai Lan, Lei Wang, Wanyu Xu, Ee-Peng Lim, Roy Ka-Wei Lee, Lidong Bing, Soujanya Poria
Abstract
The success of large language models (LLMs), like GPT-3 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by fine-tuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, OPT, and GPT-J, as well as widely used adapters such as Series adapter, Parallel adapter, and LoRA. The framework is designed to be research-friendly, efficient, modular, and extendable, allowing the integration of new adapters and the evaluation of them with new and larger-scale LLMs. Furthermore, to evaluate the effectiveness of adapters in LLMs-Adapters, we conduct experiments on six math reasoning datasets. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to that of powerful LLMs (175B) in zero-shot inference on simple math reasoning datasets. Overall, we provide a promising framework for fine-tuning large LLMs on downstream tasks. We believe the proposed LLMs-Adapters will advance adapter-based PEFT research, facilitate the deployment of research pipelines, and enable practical applications to real-world systems.
Scenario-Game ADMM: A Parallelized Scenario-Based Solver for Stochastic Noncooperative Games
Authors: Jingqi Li, Chih-Yuan Chiu, Lasse Peters, Fernando Palafox, Mustafa Karabag, Javier Alonso-Mora, Somayeh Sojoudi, Claire Tomlin, David Fridovich-Keil
Abstract
Decision making in multi-agent games can be extremely challenging, particularly under uncertainty. In this work, we propose a new sample-based approximation to a class of stochastic, general-sum, pure Nash games, where each player has an expected-value objective and a set of chance constraints. This new approximation scheme inherits the accuracy of objective approximation from the established sample average approximation (SAA) method and enjoys a feasibility guarantee derived from the scenario optimization literature. We characterize the sample complexity of this new game-theoretic approximation scheme, and observe that high accuracy usually requires a large number of samples, which results in a large number of sampled constraints. To accommodate this, we decompose the approximated game into a set of smaller games with few constraints for each sampled scenario, and propose a decentralized, consensus ADMM algorithm to efficiently compute a generalized Nash equilibrium of the approximated game. We prove the convergence of our algorithm and empirically demonstrate superior performance relative to a recent baseline.
Strong spatial mixing for colorings on trees and its algorithmic applications
Abstract
Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $\Delta$-regular tree exhibits SSM whenever $q \ge \Delta+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $\Delta \ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $\Delta$ whenever $q \ge \Delta + 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59\Delta+\gamma^$ for an absolute constant $\gamma^ > 0$. (2) For any $\Delta\ge 3$ and girth $g = \Omega_\Delta(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $\Delta$ and girth $g$ whenever $q \ge \Delta+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees.
DWA: Differential Wavelet Amplifier for Image Super-Resolution
Authors: Brian Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing it as an attractive approach for sustainable ML. Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks. Moreover, DWA enables a direct application of DWSR and MWCNN to input image space, reducing the DWT representation channel-wise since it omits traditional DWT.
Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation
Authors: Paulo Padrao, Jose Fuentes, Tero Kaarlela, Alfredo Bayuelo, Leonardo Bobadilla
Abstract
Efficient and intuitive Human-Robot interfaces are crucial for expanding the user base of operators and enabling new applications in critical areas such as precision agriculture, automated construction, rehabilitation, and environmental monitoring. In this paper, we investigate the design of human-robot interfaces for the teleoperation of dynamical systems. The proposed framework seeks to find an optimal interface that complies with key concepts such as user comfort, efficiency, continuity, and consistency. As a proof-of-concept, we introduce an innovative approach to teleoperating underwater vehicles, allowing the translation between human body movements into vehicle control commands. This method eliminates the need for divers to work in harsh underwater environments while taking into account comfort and communication constraints. We conducted a study with human subjects using a head-mounted display attached to a smartphone to control a simulated ROV. Also, numerical experiments have demonstrated that the optimal translation is often the most intuitive and natural one, aligning with users' expectations.
Multi-Level Contrastive Learning for Dense Prediction Task
Authors: Qiushan Guo, Yizhou Yu, Yi Jiang, Jiannan Wu, Zehuan Yuan, Ping Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In this work, we present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks. Our method is motivated by the three key factors in detection: localization, scale consistency and recognition. To explicitly encode absolute position and scale information, we propose a novel pretext task that assembles multi-scale images in a montage manner to mimic multi-object scenarios. Unlike the existing image-level self-supervised methods, our method constructs a multi-level contrastive loss that considers each sub-region of the montage image as a singleton. Our method enables the neural network to learn regional semantic representations for translation and scale consistency while reducing pre-training epochs to the same as supervised pre-training. Extensive experiments demonstrate that MCL consistently outperforms the recent state-of-the-art methods on various datasets with significant margins. In particular, MCL obtains 42.5 AP$^\mathrm{bb}$ and 38.3 AP$^\mathrm{mk}$ on COCO with the 1x schedule fintuning, when using Mask R-CNN with R50-FPN backbone pre-trained with 100 epochs. In comparison to MoCo, our method surpasses their performance by 4.0 AP$^\mathrm{bb}$ and 3.1 AP$^\mathrm{mk}$. Furthermore, we explore the alignment between pretext task and downstream tasks. We extend our pretext task to supervised pre-training, which achieves a similar performance to self-supervised learning. This result demonstrates the importance of the alignment between pretext task and downstream tasks, indicating the potential for wider applicability of our method beyond self-supervised settings.
NPC: Neural Point Characters from Video
Authors: Shih-Yang Su, Timur Bagautdinov, Helge Rhodin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
High-fidelity human 3D models can now be learned directly from videos, typically by combining a template-based surface model with neural representations. However, obtaining a template surface requires expensive multi-view capture systems, laser scans, or strictly controlled conditions. Previous methods avoid using a template but rely on a costly or ill-posed mapping from observation to canonical space. We propose a hybrid point-based representation for reconstructing animatable characters that does not require an explicit surface model, while being generalizable to novel poses. For a given video, our method automatically produces an explicit set of 3D points representing approximate canonical geometry, and learns an articulated deformation model that produces pose-dependent point transformations. The points serve both as a scaffold for high-frequency neural features and an anchor for efficiently mapping between observation and canonical space. We demonstrate on established benchmarks that our representation overcomes limitations of prior work operating in either canonical or in observation space. Moreover, our automatic point extraction approach enables learning models of human and animal characters alike, matching the performance of the methods using rigged surface templates despite being more general. Project website: https://lemonatsu.github.io/npc/
Keyword: faster
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
Authors: Rongqi Pan, Taher A. Ghaleb, Lionel Briand
Abstract
Test suite minimization (TSM) is typically used to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Though many TSM approaches exist, most of them rely on code coverage (white-box) or model-based features, which are not always available for test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. Though ATM achieves a better trade-off between effectiveness and efficiency than FAST-R, it suffers from scalability issues for large software systems as its execution time increases rapidly with test suite size. To address scalability, we propose LTM, a scalable and black-box similarity-based TSM approach based on language models. To support similarity measurement, we investigated three different pre-trained language models: CodeBERT, GraphCodeBERT, and UniXcoder, to extract embeddings of test code (Java test methods), on which we computed two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used for minimizing test suites, thus reducing minimization time. Experimental results showed that the best configuration of LTM (using UniXcoder with Cosine similarity) outperformed the best two configurations of ATM by achieving significantly higher fault detection rates (0.84 versus 0.81, on average) and, more importantly, running much faster (26.73 minutes versus 72.75 minutes, on average) than ATM, in terms of both preparation time (up to two orders of magnitude faster) and minimization time (one order of magnitude faster).
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
Abstract
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.
OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting
Authors: Xiao He, Ye Li, Jian Tan, Bin Wu, Feifei Li
Abstract
Seasonal-trend decomposition is one of the most fundamental concepts in time series analysis that supports various downstream tasks, including time series anomaly detection and forecasting. However, existing decomposition methods rely on batch processing with a time complexity of O(W), where W is the number of data points within a time window. Therefore, they cannot always efficiently support real-time analysis that demands low processing delay. To address this challenge, we propose OneShotSTL, an efficient and accurate algorithm that can decompose time series online with an update time complexity of O(1). OneShotSTL is more than $1,000$ times faster than the batch methods, with accuracy comparable to the best counterparts. Extensive experiments on real-world benchmark datasets for downstream time series anomaly detection and forecasting tasks demonstrate that OneShotSTL is from 10 to over 1,000 times faster than the state-of-the-art methods, while still providing comparable or even better accuracy.
IterativePFN: True Iterative Point Cloud Filtering
Authors: Dasith de Silva Edirimuni, Xuequan Lu, Zhiwen Shao, Gang Li, Antonio Robles-Kelly, Ying He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The quality of point clouds is often limited by noise introduced during their capture process. Consequently, a fundamental 3D vision task is the removal of noise, known as point cloud filtering or denoising. State-of-the-art learning based methods focus on training neural networks to infer filtered displacements and directly shift noisy points onto the underlying clean surfaces. In high noise conditions, they iterate the filtering process. However, this iterative filtering is only done at test time and is less effective at ensuring points converge quickly onto the clean surfaces. We propose IterativePFN (iterative point cloud filtering network), which consists of multiple IterationModules that model the true iterative filtering process internally, within a single network. We train our IterativePFN network using a novel loss function that utilizes an adaptive ground truth target at each iteration to capture the relationship between intermediate filtering results during training. This ensures that the filtered results converge faster to the clean surfaces. Our method is able to obtain better performance compared to state-of-the-art methods. The source code can be found at: https://github.com/ddsediri/IterativePFN.
Black Box Few-Shot Adaptation for Vision-Language models
Authors: Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners. Soft prompt learning is the method of choice for few-shot downstream adaption aiming to bridge the modality gap caused by the distribution shift induced by the new domain. While parameter-efficient, prompt learning still requires access to the model weights and can be computationally infeasible for large models with billions of parameters. To address these shortcomings, in this work, we describe a black-box method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model's weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni-modal models. To achieve this, we propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain. LFA is initialized from a closed-form solution to a least-squares problem and then it is iteratively updated by minimizing a re-ranking loss. Despite its simplicity, our approach can even surpass soft-prompt learning methods as shown by extensive experiments on 11 image and 2 video datasets.
Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation
Authors: Andrea Ghezzi, Jasper Hoffman, Jonathan Frey, Joschka Boedecker, Moritz Diehl
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
This work presents a novel loss function for learning nonlinear Model Predictive Control policies via Imitation Learning. Standard approaches to Imitation Learning neglect information about the expert and generally adopt a loss function based on the distance between expert and learned controls. In this work, we present a loss based on the Q-function directly embedding the performance objectives and constraint satisfaction of the associated Optimal Control Problem (OCP). However, training a Neural Network with the Q-loss requires solving the associated OCP for each new sample. To alleviate the computational burden, we derive a second Q-loss based on the Gauss-Newton approximation of the OCP resulting in a faster training time. We validate our losses against Behavioral Cloning, the standard approach to Imitation Learning, on the control of a nonlinear system with constraints. The final results show that the Q-function-based losses significantly reduce the amount of constraint violations while achieving comparable or better closed-loop costs.
Keyword: mobile
Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace
Authors: Ingrid Navarro, Jay Patrikar, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer, Jean Oh
Abstract
The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments in scenarios with increasing complexity.
End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response
Abstract
In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openMVG/openMVS) are exceedingly time consuming, especially when run on such computationally constrained mobile edge devices. Additionally, reducing the reconstruction latency of such inherently sequential algorithms is challenging as unintelligent, application-agnostic strategies can drastically degrade the reconstruction (i.e., application outcome) quality making them useless. In this paper, we aim to design a latency optimized MVS algorithm pipeline, with the objective to best balance the end-to-end latency and reconstruction quality by running the pipeline on a collaborative mobile edge environment. The overall optimization approach is two-pronged where: (a) application optimizations introduce data-level parallelism by splitting the pipeline into high frequency and low frequency reconstruction components and (b) system optimizations incorporate task-level parallelism to the pipelines by running them opportunistically on available resources with online quality control in order to balance both latency and quality. Our evaluation on a hardware testbed using publicly available datasets shows upto ~54% reduction in latency with negligible loss (~4-7%) in reconstruction quality.
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.
Energy-Saving Strategies for Mobile Web Apps and their Measurement: Results from a Decade of Research (Preprint)
Abstract
In 2022, over half of the web traffic was accessed through mobile devices. By reducing the energy consumption of mobile web apps, we can not only extend the battery life of our devices, but also make a significant contribution to energy conservation efforts. For example, if we could save only 5% of the energy used by web apps, we estimate that it would be enough to shut down one of the nuclear reactors in Fukushima. This paper presents a comprehensive overview of energy-saving experiments and related approaches for mobile web apps, relevant for researchers and practitioners. To achieve this objective, we conducted a systematic literature review and identified 44 primary studies for inclusion. Through the mapping and analysis of scientific papers, this work contributes: (1) an overview of the energy-draining aspects of mobile web apps, (2) a comprehensive description of the methodology used for the energy-saving experiments, and (3) a categorization and synthesis of various energy-saving approaches.
Model Predictive Control for Multi-Agent Systems under Limited Communication and Time-Varying Network Topology
Authors: Danilo Saccani, Lorenzo Fagiano, Melanie N. Zeilinger, Andrea Carron
Abstract
In control system networks, reconfiguration of the controller when agents are leaving or joining the network is still an open challenge, in particular when operation constraints that depend on each agent's behavior must be met. Drawing our motivation from mobile robot swarms, in this paper, we address this problem by optimizing individual agent performance while guaranteeing persistent constraint satisfaction in presence of bounded communication range and time-varying network topology. The approach we propose is a model predictive control (MPC) formulation, building on multi-trajectory MPC (mt-MPC) concepts. To enable plug and play operations when the system is in closed-loop without the need of a request, the proposed MPC scheme predicts two different state trajectories in the same finite horizon optimal control problem. One trajectory drives the system to the desired target, assuming that the network topology will not change in the prediction horizon, while the second one ensures constraint satisfaction assuming a worst-case scenario in terms of new agents joining the network in the planning horizon. Recursive feasibility and stability of the closed-loop system during plug and play operations are shown. The approach effectiveness is illustrated with a numerical simulation.
Keyword: pruning
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching
Authors: Pedro Castro, Tae-Kyun Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher.
Attention Map Guided Transformer Pruning for Edge Device
Abstract
Due to its significant capability of modeling long-range dependencies, vision transformer (ViT) has achieved promising success in both holistic and occluded person re-identification (Re-ID) tasks. However, the inherent problems of transformers such as the huge computational cost and memory footprint are still two unsolved issues that will block the deployment of ViT based person Re-ID models on resource-limited edge devices. Our goal is to reduce both the inference complexity and model size without sacrificing the comparable accuracy on person Re-ID, especially for tasks with occlusion. To this end, we propose a novel attention map guided (AMG) transformer pruning method, which removes both redundant tokens and heads with the guidance of the attention map in a hardware-friendly way. We first calculate the entropy in the key dimension and sum it up for the whole map, and the corresponding head parameters of maps with high entropy will be removed for model size reduction. Then we combine the similarity and first-order gradients of key tokens along the query dimension for token importance estimation and remove redundant key and value tokens to further reduce the inference complexity. Comprehensive experiments on Occluded DukeMTMC and Market-1501 demonstrate the effectiveness of our proposals. For example, our proposed pruning strategy on ViT-Base enjoys \textup{\textbf{29.4\%}} \textup{\textbf{FLOPs}} savings with \textup{\textbf{0.2\%}} drop on Rank-1 and \textup{\textbf{0.4\%}} improvement on mAP, respectively.
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
Authors: Omar Erak, Hatem Abou-Zeid
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.
Keyword: voxel
Unsupervised Brain Tumor Segmentation with Image-based Prompts
Authors: Xinru Zhang, Ni Ou, Chenghao Liu, Zhizheng Zhuo, Yaou Liu, Chuyang Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Automated brain tumor segmentation based on deep learning (DL) has achieved promising performance. However, it generally relies on annotated images for model training, which is not always feasible in clinical settings. Therefore, the development of unsupervised DL-based brain tumor segmentation approaches without expert annotations is desired. Motivated by the success of prompt learning (PL) in natural language processing, we propose an approach to unsupervised brain tumor segmentation by designing image-based prompts that allow indication of brain tumors, and this approach is dubbed as PL-based Brain Tumor Segmentation (PL-BTS). Specifically, instead of directly training a model for brain tumor segmentation with a large amount of annotated data, we seek to train a model that can answer the question: is a voxel in the input image associated with tumor-like hyper-/hypo-intensity? Such a model can be trained by artificially generating tumor-like hyper-/hypo-intensity on images without tumors with hand-crafted designs. Since the hand-crafted designs may be too simplistic to represent all kinds of real tumors, the trained model may overfit the simplistic hand-crafted task rather than actually answer the question of abnormality. To address this problem, we propose the use of a validation task, where we generate a different hand-crafted task to monitor overfitting. In addition, we propose PL-BTS+ that further improves PL-BTS by exploiting unannotated images with brain tumors. Compared with competing unsupervised methods, the proposed method has achieved marked improvements on both public and in-house datasets, and we have also demonstrated its possible extension to other brain lesion segmentation tasks.
FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction
Abstract
Recent works on 3D reconstruction from posed images have demonstrated that direct inference of scene-level 3D geometry without iterative optimization is feasible using a deep neural network, showing remarkable promise and high efficiency. However, the reconstructed geometries, typically represented as a 3D truncated signed distance function (TSDF), are often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. We first present a resolution-agnostic TSDF supervision strategy to provide the network with a more accurate learning signal during training, avoiding the pitfalls of TSDF interpolation seen in previous work. We then introduce a depth guidance strategy using multi-view depth estimates to enhance the scene representation and recover more accurate surfaces. Finally, we develop a novel architecture for the final layers of the network, conditioning the output TSDF prediction on high-resolution image features in addition to coarse voxel features, enabling sharper reconstruction of fine details. Our method produces smooth and highly accurate reconstructions, showing significant improvements across multiple depth and 3D reconstruction metrics.
Keyword: lidar
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation
Authors: Haitao Yang, Zaiwei Zhang, Xiangru Huang, Min Bai, Chen Song, Bo Sun, Li Erran Li, Qixing Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Bird's-Eye View (BEV) features are popular intermediate scene representations shared by the 3D backbone and the detector head in LiDAR-based object detectors. However, little research has been done to investigate how to incorporate additional supervision on the BEV features to improve proposal generation in the detector head, while still balancing the number of powerful 3D layers and efficient 2D network operations. This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D, which serves as a dense supervision signal for better BEV feature learning. The key idea is to use auxiliary networks to predict a combination of explicit and implicit semantic probabilities by exploiting their complementary properties. Extensive experiments show that our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors and consistently improves upon baseline models.
USTC FLICAR: A Multisensor Fusion Dataset of LiDAR-Inertial-Camera for Heavy-duty Autonomous Aerial Work Robots
Abstract
In this paper, we present the USTC FLICAR Dataset, which is dedicated to the development of simultaneous localization and mapping and precise 3D reconstruction of the workspace for heavy-duty autonomous aerial work robots. In recent years, numerous public datasets have played significant roles in the advancement of autonomous cars and unmanned aerial vehicles (UAVs). However, these two platforms differ from aerial work robots: UAVs are limited in their payload capacity, while cars are restricted to two-dimensional movements. To fill this gap, we create the Giraffe mapping robot based on a bucket truck, which is equipped with a variety of well-calibrated and synchronized sensors: four 3D LiDARs, two stereo cameras, two monocular cameras, Inertial Measurement Units (IMUs), and a GNSS/INS system. A laser tracker is used to record the millimeter-level ground truth positions. We also make its ground twin, the Okapi mapping robot, to gather data for comparison. The proposed dataset extends the typical autonomous driving sensing suite to aerial scenes. Therefore, the dataset is named FLICAR to denote flying cars. We believe this dataset can also represent the flying car scenarios, specifically the takeoff and landing of VTOL (Vertical Takeoff and Landing) flying cars. The dataset is available for download at: https://ustc-flicar.github.io.
Keyword: diffusion
NeuroDAVIS: A neural network model for data visualization
Authors: Chayan Maitra, Dibyendu B. Seal, Rajat K. De
Abstract
The task of dimensionality reduction and visualization of high-dimensional datasets remains a challenging problem since long. Modern high-throughput technologies produce newer high-dimensional datasets having multiple views with relatively new data types. Visualization of these datasets require proper methodology that can uncover hidden patterns in the data without affecting the local and global structures within the data. To this end, however, very few such methodology exist, which can realise this task. In this work, we have introduced a novel unsupervised deep neural network model, called NeuroDAVIS, for data visualization. NeuroDAVIS is capable of extracting important features from the data, without assuming any data distribution, and visualize effectively in lower dimension. It has been shown theoritically that neighbourhood relationship of the data in high dimension remains preserved in lower dimension. The performance of NeuroDAVIS has been evaluated on a wide variety of synthetic and real high-dimensional datasets including numeric, textual, image and biological data. NeuroDAVIS has been highly competitive against both t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) with respect to visualization quality, and preservation of data size, shape, and both local and global structure. It has outperformed Fast interpolation-based t-SNE (Fit-SNE), a variant of t-SNE, for most of the high-dimensional datasets as well. For the biological datasets, besides t-SNE, UMAP and Fit-SNE, NeuroDAVIS has also performed well compared to other state-of-the-art algorithms, like Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE) and the siamese neural network-based method, called IVIS. Downstream classification and clustering analyses have also revealed favourable results for NeuroDAVIS-generated embeddings.
Generative Diffusion Prior for Unified Image Restoration and Enhancement
Authors: Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, Bo Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative model (DDPM) for solving linear inverse, non-linear, or blind problems. Specifically, GDP systematically explores a protocol of conditional guidance, which is verified more practical than the commonly used guidance way. Furthermore, GDP is strength at optimizing the parameters of degradation model during the denoising process, achieving blind image restoration. Besides, we devise hierarchical guidance and patch-based methods, enabling the GDP to generate images of arbitrary resolutions. Experimentally, we demonstrate GDP's versatility on several image datasets for linear problems, such as super-resolution, deblurring, inpainting, and colorization, as well as non-linear and blind issues, such as low-light enhancement and HDR image recovery. GDP outperforms the current leading unsupervised methods on the diverse benchmarks in reconstruction quality and perceptual quality. Moreover, GDP also generalizes well for natural images or synthesized images with arbitrary sizes from various tasks out of the distribution of the ImageNet training set.
The Interconnected Nature of Online Harm and Moderation: Investigating the Cross-Platform Spread of Harmful Content between YouTube and Twitter
Authors: Valerio La Gatta, Luca Luceri, Francesco Fabbri, Emilio Ferrra
Abstract
The proliferation of harmful content shared online poses a threat to online information integrity and the integrity of discussion across platforms. Despite various moderation interventions adopted by social media platforms, researchers and policymakers are calling for holistic solutions. This study explores how a target platform could leverage content that has been deemed harmful on a source platform by investigating the behavior and characteristics of Twitter users responsible for sharing moderated YouTube videos. Using a large-scale dataset of 600M tweets related to the 2020 U.S. election, we find that moderated Youtube videos are extensively shared on Twitter and that users who share these videos also endorse extreme and conspiratorial ideologies. A fraction of these users are eventually suspended by Twitter, but they do not appear to be involved in state-backed information operations. The findings of this study highlight the complex and interconnected nature of harmful cross-platform information diffusion, raising the need for cross-platform moderation strategies.
Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
Authors: Jaewoong Lee, Sangwon Jang, Jaehyeong Jo, Jaehong Yoon, Yunji Kim, Jin-Hwa Kim, Jung-Woo Ha, Sung Ju Hwang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50% without modifying the original generative model.
A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material
Abstract
Diffusion models have become a new SOTA generative modeling method in various fields, for which there are multiple survey works that provide an overall survey. With the number of articles on diffusion models increasing exponentially in the past few years, there is an increasing need for surveys of diffusion models on specific fields. In this work, we are committed to conducting a survey on the graph diffusion models. Even though our focus is to cover the progress of diffusion models in graphs, we first briefly summarize how other generative modeling methods are used for graphs. After that, we introduce the mechanism of diffusion models in various forms, which facilitates the discussion on the graph diffusion models. The applications of graph diffusion models mainly fall into the category of AI-generated content (AIGC) in science, for which we mainly focus on how graph diffusion models are utilized for generating molecules and proteins but also cover other cases, including materials design. Moreover, we discuss the issue of evaluating diffusion models in the graph domain and the existing challenges.
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
Authors: Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany
Abstract
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups while accounting for the surrounding environment context. This trajectory diffusion model is integrated with a novel physics-based humanoid controller to form a closed-loop, full-body pedestrian animation system capable of placing large crowds in a simulated environment with varying terrains. We further propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios such as collision avoidance and traversing uneven terrain. Video results are available on the project page at https://nv-tlabs.github.io/trace-pace .
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion
Authors: Gwanghyun Kim, Ji Ha Jang, Se Young Chun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Recently, significant advancements have been made in 3D generative models, however training these models across diverse domains is challenging and requires an huge amount of training data and knowledge of pose distribution. Text-guided domain adaptation methods have allowed the generator to be adapted to the target domains using text prompts, thereby obviating the need for assembling numerous data. Recently, DATID-3D presents impressive quality of samples in text-guided domain, preserving diversity in text by leveraging text-to-image diffusion. However, adapting 3D generators to domains with significant domain gaps from the source domain still remains challenging due to issues in current text-to-image diffusion models as following: 1) shape-pose trade-off in diffusion-based translation, 2) pose bias, and 3) instance bias in the target domain, resulting in inferior 3D shapes, low text-image correspondence, and low intra-domain diversity in the generated samples. To address these issues, we propose a novel pipeline called PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models. We construct a pose-preserved text-to-image diffusion model that allows the use of extremely high-level noise for significant domain changes. We also propose specialized-to-general sampling strategies to improve the details of the generated samples. Moreover, to overcome the instance bias, we introduce a text-guided debiasing method that improves intra-domain diversity. Consequently, our method successfully adapts 3D generators across significant domain gaps. Our qualitative results and user study demonstrates that our approach outperforms existing 3D text-guided domain adaptation methods in terms of text-image correspondence, realism, diversity of rendered images, and sense of depth of 3D shapes in the generated samples
Keyword: dynamic
SEENN: Towards Temporal Spiking Early-Exit Neural Networks
Authors: Yuhang Li, Tamar Geller, Youngeun Kim, Priyadarshini Panda
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
Spiking Neural Networks (SNNs) have recently become more popular as a biologically plausible substitute for traditional Artificial Neural Networks (ANNs). SNNs are cost-efficient and deployment-friendly because they process input in both spatial and temporal manners using binary spikes. However, we observe that the information capacity in SNNs is affected by the number of timesteps, leading to an accuracy-efficiency tradeoff. In this work, we study a fine-grained adjustment of the number of timesteps in SNNs. Specifically, we treat the number of timesteps as a variable conditioned on different input samples to reduce redundant timesteps for certain data. We call our method Spiking Early-Exit Neural Networks (SEENNs). To determine the appropriate number of timesteps, we propose SEENN-I which uses a confidence score thresholding to filter out the uncertain predictions, and SEENN-II which determines the number of timesteps by reinforcement learning. Moreover, we demonstrate that SEENN is compatible with both the directly trained SNN and the ANN-SNN conversion. By dynamically adjusting the number of timesteps, our SEENN achieves a remarkable reduction in the average number of timesteps during inference. For example, our SEENN-II ResNet-19 can achieve 96.1% accuracy with an average of 1.08 timesteps on the CIFAR-10 test dataset.
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
Authors: Yan Jin, Mengke LI, Yang Lu, Yiu-ming Cheung, Hanzi Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this problem, state-of-the-art methods usually adopt a mixture of experts (MoE) to focus on different parts of the long-tailed distribution. Experts in these methods are with the same model depth, which neglects the fact that different classes may have different preferences to be fit by models with different depths. To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). We first propose Depth-wise Knowledge Fusion (DKF) to fuse features between different shallow parts and the deep part in one network for each expert, which makes experts more diverse in terms of representation. Based on DKF, we further propose Dynamic Knowledge Transfer (DKT) to reduce the influence of the hardest negative class that has a non-negligible impact on the tail classes in our MoE framework. As a result, the classification accuracy of long-tailed data can be significantly improved, especially for the tail classes. SHIKE achieves the state-of-the-art performance of 56.3%, 60.3%, 75.4%, and 41.9% on CIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, and Places-LT, respectively.
Non-Generative Energy Based Models
Authors: Jacob Piland, Christopher Sweet, Priscila Saboia, Charles Vardeman II, Adam Czajka
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Energy-based models (EBM) have become increasingly popular within computer vision. EBMs bring a probabilistic approach to training deep neural networks (DNN) and have been shown to enhance performance in areas such as calibration, out-of-distribution detection, and adversarial resistance. However, these advantages come at the cost of estimating input data probabilities, usually using a Langevin based method such as Stochastic Gradient Langevin Dynamics (SGLD), which bring additional computational costs, require parameterization, caching methods for efficiency, and can run into stability and scaling issues. EBMs use dynamical methods to draw samples from the probability density function (PDF) defined by the current state of the network and compare them to the training data using a maximum log likelihood approach to learn the correct PDF. We propose a non-generative training approach, Non-Generative EBM (NG-EBM), that utilizes the {\it{Approximate Mass}}, identified by Grathwohl et al., as a loss term to direct the training. We show that our NG-EBM training strategy retains many of the benefits of EBM in calibration, out-of-distribution detection, and adversarial resistance, but without the computational complexity and overhead of the traditional approaches. In particular, the NG-EBM approach improves the Expected Calibration Error by a factor of 2.5 for CIFAR10 and 7.5 times for CIFAR100, when compared to traditionally trained models.
Lilac: a Modal Separation Logic for Conditional Probability
Authors: John M. Li, Amal Ahmed, Steven Holtzen
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)
Abstract
We present Lilac, a separation logic for reasoning about probabilistic programs where separating conjunction captures probabilistic independence. Inspired by an analogy with mutable state where sampling corresponds to dynamic allocation, we show how probability spaces over a fixed, ambient sample space appear to be the natural analogue of heap fragments, and present a new combining operation on them such that probability spaces behave like heaps and measurability of random variables behaves like ownership. This combining operation forms the basis for our model of separation, and produces a logic with many pleasant properties. In particular, Lilac has a frame rule identical to the ordinary one, and naturally accommodates advanced features like continuous random variables and reasoning about quantitative properties of programs. Then we propose a new modality based on disintegration theory for reasoning about conditional probability. We show how the resulting modal logic validates examples from prior work, and give a formal verification of an intricate weighted sampling algorithm whose correctness depends crucially on conditional independence structure.
Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)
Authors: Aniket Pramanik, Mathews Jacob
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Model-based deep learning methods that combine imaging physics with learned regularization priors have been emerging as powerful tools for parallel MRI acceleration. The main focus of this paper is to determine the utility of the monotone operator learning (MOL) framework in the parallel MRI setting. The MOL algorithm alternates between a gradient descent step using a monotone convolutional neural network (CNN) and a conjugate gradient algorithm to encourage data consistency. The benefits of this approach include similar guarantees as compressive sensing algorithms including uniqueness, convergence, and stability, while being significantly more memory efficient than unrolled methods. We validate the proposed scheme by comparing it with different unrolled algorithms in the context of accelerated parallel MRI for static and dynamic settings.
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Authors: Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
Abstract
How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.
Lidar based 3D Tracking and State Estimation of Dynamic Objects
Abstract
State estimation of oncoming vehicles: Earlier research has been based on determining states like position, velocity, orientation , angular velocity, etc of ego-vehicle. Our approach focuses on estimating the states of non-ego vehicles which is crucial for Motion planning and decision-making. Dynamic Scene Based Localization: Our project will work on dynamic scenes like moving ego (self) and non-ego vehicles. Previous methods were focused on static environments.
Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace
Authors: Ingrid Navarro, Jay Patrikar, Joao P. A. Dantas, Rohan Baijal, Ian Higgins, Sebastian Scherer, Jean Oh
Abstract
The fast-growing demand for fully autonomous aerial operations in shared spaces necessitates developing trustworthy agents that can safely and seamlessly navigate in crowded, dynamic spaces. In this work, we propose Social Robot Tree Search (SoRTS), an algorithm for the safe navigation of mobile robots in social domains. SoRTS aims to augment existing socially-aware trajectory prediction policies with a Monte Carlo Tree Search planner for improved downstream navigation of mobile robots. To evaluate the performance of our method, we choose the use case of social navigation for general aviation. To aid this evaluation, within this work, we also introduce X-PlaneROS, a high-fidelity aerial simulator, to enable more research in full-scale aerial autonomy. By conducting a user study based on the assessments of 26 FAA certified pilots, we show that SoRTS performs comparably to a competent human pilot, significantly outperforming our baseline algorithm. We further complement these results with self-play experiments in scenarios with increasing complexity.
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
Authors: Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
Abstract
In response to innovations in machine learning (ML) models, production workloads changed radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its third supercomputer for such ML models. Optical circuit switches (OCSes) dynamically reconfigure its interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance; users can pick a twisted 3D torus topology if desired. Much cheaper, lower power, and faster than Infiniband, OCSes and underlying optical components are <5% of system cost and <3% of system power. Each TPU v4 includes SparseCores, dataflow processors that accelerate models that rely on embeddings by 5x-7x yet use only 5% of die area and power. Deployed since 2020, TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.
Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Abstract
We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduce over-smoothing and improve out-of-model expressions synthesis, we propose to predict local features anchored on the 3DMM geometry. These learnt features are driven by 3DMM deformation and interpolated in 3D space to yield the volumetric radiance at a designated query point. We further show that using a Convolutional Neural Network in the UV space is critical in incorporating spatial context and producing representative local features. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches.
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
Abstract
State-of-the-art 3D object detectors are usually trained on large-scale datasets with high-quality 3D annotations. However, such 3D annotations are often expensive and time-consuming, which may not be practical for real applications. A natural remedy is to adopt semi-supervised learning (SSL) by leveraging a limited amount of labeled samples and abundant unlabeled samples. Current pseudolabeling-based SSL object detection methods mainly adopt a teacher-student framework, with a single fixed threshold strategy to generate supervision signals, which inevitably brings confused supervision when guiding the student network training. Besides, the data augmentation of the point cloud in the typical teacher-student framework is too weak, and only contains basic down sampling and flip-and-shift (i.e., rotate and scaling), which hinders the effective learning of feature information. Hence, we address these issues by introducing a novel approach of Hierarchical Supervision and Shuffle Data Augmentation (HSSDA), which is a simple yet effective teacher-student framework. The teacher network generates more reasonable supervision for the student network by designing a dynamic dual-threshold strategy. Besides, the shuffle data augmentation strategy is designed to strengthen the feature representation ability of the student network. Extensive experiments show that HSSDA consistently outperforms the recent state-of-the-art methods on different datasets. The code will be released at https://github.com/azhuantou/HSSDA.
DLRover: An Elastic Deep Training Extension with Auto Job Resource Recommendation
Authors: Qinlong Wang, Bo Sang, Haitao Zhang, Mingjie Tang, Ke Zhang
Abstract
The cloud is still a popular platform for distributed deep learning (DL) training jobs since resource sharing in the cloud can improve resource utilization and reduce overall costs. However, such sharing also brings multiple challenges for DL training jobs, e.g., high-priority jobs could impact, even interrupt, low-priority jobs. Meanwhile, most existing distributed DL training systems require users to configure the resources (i.e., the number of nodes and resources like CPU and memory allocated to each node) of jobs manually before job submission and can not adjust the job's resources during the runtime. The resource configuration of a job deeply affect this job's performance (e.g., training throughput, resource utilization, and completion rate). However, this usually leads to poor performance of jobs since users fail to provide optimal resource configuration in most cases. \system~is a distributed DL framework can auto-configure a DL job's initial resources and dynamically tune the job's resources to win the better performance. With elastic capability, \system~can effectively adjusts the resources of a job when there are performance issues detected or a job fails because of faults or eviction. Evaluations results show \system~can outperform manual well-tuned resource configurations. Furthermore, in the production Kubernetes cluster of \company, \system~reduces the medium of job completion time by 31\%, and improves the job completion rate by 6\%, CPU utilization by 15\%, and memory utilization by 20\% compared with manual configuration.
Multi model LSTM architecture for Track Association based on Automatic Identification System Data
Abstract
For decades, track association has been a challenging problem in marine surveillance, which involves the identification and association of vessel observations over time. However, the Automatic Identification System (AIS) has provided a new opportunity for researchers to tackle this problem by offering a large database of dynamic and geo-spatial information of marine vessels. With the availability of such large databases, researchers can now develop sophisticated models and algorithms that leverage the increased availability of data to address the track association challenge effectively. Furthermore, with the advent of deep learning, track association can now be approached as a data-intensive problem. In this study, we propose a Long Short-Term Memory (LSTM) based multi-model framework for track association. LSTM is a recurrent neural network architecture that is capable of processing multivariate temporal data collected over time in a sequential manner, enabling it to predict current vessel locations from historical observations. Based on these predictions, a geodesic distance based similarity metric is then utilized to associate the unclassified observations to their true tracks (vessels). We evaluate the performance of our approach using standard performance metrics, such as precision, recall, and F1 score, which provide a comprehensive summary of the accuracy of the proposed framework.
Multimodal Neural Processes for Uncertainty Estimation
Authors: Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du
Abstract
Neural processes (NPs) have brought the representation power of parametric deep neural networks and the reliable uncertainty estimation of non-parametric Gaussian processes together. Although recent development of NPs has shown success in both regression and classification, how to adapt NPs to multimodal data has not be carefully studied. For the first time, we propose a new model of NP family for multimodal uncertainty estimation, namely Multimodal Neural Processes. In a holistic and principled way, we develop a dynamic context memory updated by the classification error, a multimodal Bayesian aggregation mechanism to aggregate multimodal representations, and a new attention mechanism for calibrated predictions. In extensive empirical evaluation, our method achieves the state-of-the-art multimodal uncertainty estimation performance, showing its appealing ability of being robust against noisy samples and reliable in out-of-domain detection.
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Authors: Kohav Dey, Krishna Bajaj, K S Ramalakshmi, Samuel Thomas, Sriram Radhakrishna
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Marine ecosystems are vital for the planet's health, but human activities such as climate change, pollution, and overfishing pose a constant threat to marine species. Accurate classification and monitoring of these species can aid in understanding their distribution, population dynamics, and the impact of human activities on them. However, classifying marine species can be challenging due to their vast diversity and the complex underwater environment. With advancements in computer performance and GPU-based computing, deep-learning algorithms can now efficiently classify marine species, making it easier to monitor and manage marine ecosystems. In this paper, we propose an optimization to the MobileNetV2 model to achieve a 99.83% average validation accuracy by highlighting specific guidelines for creating a dataset and augmenting marine species images. This transfer learning algorithm can be deployed successfully on a mobile application for on-site classification at fisheries.
Online Learning with Adversaries: A Differential Inclusion Analysis
Abstract
We consider the measurement model $Y = AX,$ where $X$ and, hence, $Y$ are random variables and $A$ is an a priori known tall matrix. At each time instance, a sample of one of $Y$'s coordinates is available, and the goal is to estimate $\mu := \mathbb{E}[X]$ via these samples. However, the challenge is that a small but unknown subset of $Y$'s coordinates are controlled by adversaries with infinite power: they can return any real number each time they are queried for a sample. For such an adversarial setting, we propose the first asynchronous online algorithm that converges to $\mu$ almost surely. We prove this result using a novel differential inclusion based two-timescale analysis. Two key highlights of our proof include: (a) the use of a novel Lyapunov function for showing that $\mu$ is the unique global attractor for our algorithm's limiting dynamics, and (b) the use of martingale and stopping time theory to show that our algorithm's iterates are almost surely bounded.
Proving the Convergence to Limit Cycles using Periodically Decreasing Jacobian Matrix Measures
Abstract
Methods based on "(Jacobian) matrix measure" to show the convergence of a dynamical system to a limit cycle (LC), generally assume that the measure is negative everywhere on the LC. We relax this assumption by assuming that the matrix measure is negative "on average" over one period of LC. Using an approximate Euler trajectory, we thus present a method that guarantees the LC existence, and allows us to construct a basin of attraction. This is illustrated on the example of the Van der Pol system.
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
Authors: Meng You, Junhui Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The challenge of dynamic view synthesis from dynamic monocular videos, i.e., synthesizing novel views for free viewpoints given a monocular video of a dynamic scene captured by a moving camera, mainly lies in accurately modeling the dynamic objects of a scene using limited 2D frames, each with a varying timestamp and viewpoint. Existing methods usually require pre-processed 2D optical flow and depth maps by additional methods to supervise the network, making them suffer from the inaccuracy of the pre-processed supervision and the ambiguity when lifting the 2D information to 3D. In this paper, we tackle this challenge in an unsupervised fashion. Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints. The former enforces the 3D geometric surfaces of moving objects to be consistent over time, while the latter regularizes their appearances to be consistent across different viewpoints. Such a fine-grained motion formulation can alleviate the learning difficulty for the network, thus enabling it to produce not only novel views with higher quality but also more accurate scene flows and depth than existing methods requiring extra supervision. We will make the code publicly available.
Virtio-FPGA: a virtualization solution for SoC-attached FPGAs
Authors: Anna Panagopoulou, Michele Paolino, Daniel Raho
Abstract
Recently, FPGA accelerators have risen in popularity as they present a suitable way of satisfying the high-computation and low-power demands of real time applications. The modern electric transportation systems (such as aircraft, road vehicles) can greatly profit from embedded FPGAs, which incorporate both high-performance and flexibility features into a single SoC. At the same time, the virtualization of FPGA resources aims to reinforce these systems with strong isolation, consolidation and security. In this paper, we present a novel virtualization framework aimed for SoC-attached FPGA devices, in a Linux and QEMU/KVM setup. We use Virtio as a means to enable the configuration of FPGA resources from guest systems in an efficient way. Also, we employ the Linux VFIO and Device Tree Overlays technologies in order to render the FPGA resources dynamically accessible to guest systems. The ability to dynamically configure and utilize the FPGA resources from a virtualization environment is described in details. The evaluation procedure of the solution is presented and the virtualization overhead is benchmarked as minimal (around 10%) when accessing the FPGA devices from guest systems.
Adaptive parallelization of multi-agent simulations with localized dynamics
Authors: Alexandru-Ionuţ Băbeanu, Tatiana Filatova, Jan H. Kwakkel, Neil Yorke-Smith
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph)
Abstract
Agent-based modelling constitutes a versatile approach to representing and simulating complex systems. Studying large-scale systems is challenging because of the computational time required for the simulation runs: scaling is at least linear in system size (number of agents). Given the inherently modular nature of MABSs, parallel computing is a natural approach to overcoming this challenge. However, because of the shared information and communication between agents, parellelization is not simple. We present a protocol for shared-memory, parallel execution of MABSs. This approach is useful for models that can be formulated in terms of sequential computations, and that involve updates that are localized, in the sense of involving small numbers of agents. The protocol has a bottom-up and asynchronous nature, allowing it to deal with heterogeneous computation in an adaptive, yet graceful manner. We illustrate the potential performance gains on exemplar cultural dynamics and disease spreading MABSs.
Dynamic treewidth
Authors: Tuukka Korhonen, Konrad Majewski, Wojciech Nadara, Michał Pilipczuk, Marek Sokołowski
Abstract
We present a data structure that for a dynamic graph $G$ that is updated by edge insertions and deletions, maintains a tree decomposition of $G$ of width at most $6k+5$ under the promise that the treewidth of $G$ never grows above $k$. The amortized update time is ${\cal O}_k(2^{\sqrt{\log n}\log\log n})$, where $n$ is the vertex count of $G$ and the ${\cal O}_k(\cdot)$ notation hides factors depending on $k$. In addition, we also obtain the dynamic variant of Courcelle's Theorem: for any fixed property $\varphi$ expressible in the $\mathsf{CMSO}_2$ logic, the data structure can maintain whether $G$ satisfies $\varphi$ within the same time complexity bounds. To a large extent, this answers a question posed by Bodlaender [WG 1993].
Operator splitting for port-Hamiltonian systems
Authors: Andreas Frommer, Michael Günther, Björn Liljegren-Sailer, Nicole Marheineke
Abstract
The port-Hamiltonian approach presents an energy-based modeling of dynamical systems with energy-conservative and energy-dissipative parts as well as an interconnection over the so-called ports. In this paper, we apply an operator splitting that treats the energy-conservative and energy-dissipative parts separately. This paves the way for linear equation solvers to exploit the respective special structures of the iteration matrices as well as the multirate potential in the different right-hand sides. We illustrate the approach using test examples from coupled multibody system dynamics.
Mixing predictions for online metric algorithms
Authors: Antonios Antoniadis, Christian Coester, Marek Eliáš, Adam Polak, Bertrand Simon
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Abstract
A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+\epsilon)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem.
Machine Learning Discovery of Optimal Quadrature Rules for Isogeometric Analysis
Authors: Tomas Teijeiro, Jamie M. Taylor, Ali Hashemian, David Pardo
Abstract
We propose the use of machine learning techniques to find optimal quadrature rules for the construction of stiffness and mass matrices in isogeometric analysis (IGA). We initially consider 1D spline spaces of arbitrary degree spanned over uniform and non-uniform knot sequences, and then the generated optimal rules are used for integration over higher-dimensional spaces using tensor product sense. The quadrature rule search is posed as an optimization problem and solved by a machine learning strategy based on gradient-descent. However, since the optimization space is highly non-convex, the success of the search strongly depends on the number of quadrature points and the parameter initialization. Thus, we use a dynamic programming strategy that initializes the parameters from the optimal solution over the spline space with a lower number of knots. With this method, we found optimal quadrature rules for spline spaces when using IGA discretizations with up to 50 uniform elements and polynomial degrees up to 8, showing the generality of the approach in this scenario. For non-uniform partitions, the method also finds an optimal rule in a reasonable number of test cases. We also assess the generated optimal rules in two practical case studies, namely, the eigenvalue problem of the Laplace operator and the eigenfrequency analysis of freeform curved beams, where the latter problem shows the applicability of the method to curved geometries. In particular, the proposed method results in savings with respect to traditional Gaussian integration of up to 44% in 1D, 68% in 2D, and 82% in 3D spaces.
FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks
Abstract
In this work, we investigate the challenging problem of on-demand semantic communication over heterogeneous wireless networks. We propose a fidelity-adjustable semantic transmission framework (FAST) that empowers wireless devices to send data efficiently under different application scenarios and resource conditions. To this end, we first design a dynamic sub-model training scheme to learn the flexible semantic model, which enables edge devices to customize the transmission fidelity with different widths of the semantic model. After that, we focus on the FAST optimization problem to minimize the system energy consumption with latency and fidelity constraints. Following that, the optimal transmission strategies including the scaling factor of the semantic model, computing frequency, and transmitting power are derived for the devices. Experiment results indicate that, when compared to the baseline transmission schemes, the proposed framework can reduce up to one order of magnitude of the system energy consumption and data size for maintaining reasonable data fidelity.
Unified Behavioral Data-Driven Performance Analysis: A Generalized Plant Approach
Authors: L. M. Spin, C. Verhoek, W. P. M. H. Heemels, N. van de Wouw, R. Tóth
Abstract
In this paper, we present a novel approach to combine data-driven non-parametric representations with model-based representations of dynamical systems. Based on a data-driven form of linear fractional transformations, we introduce a data-driven form of generalized plants. This form can be leveraged to accomplish performance characterizations, e.g., in the form of a mixed-sensitivity approach, and LMI-based conditions to verify finite-horizon dissipativity. In particular, we show how finite-horizon $\ell_2$-gain under weighting filter-based general performance specifications are verified for implemented controllers on systems for which only input-output data is available. The overall effectiveness of the proposed method is demonstrated by simulation examples.
Rolling the Dice: Imagining Generative AI as a Dungeons & Dragons Storytelling Companion
Authors: Jose Ma. Santiago III, Richard Lance Parayno, Jordan Aiko Deja, Briane Paul V. Samson
Abstract
AI Advancements have augmented casual writing and story generation, but their usage poses challenges in collaborative storytelling. In role-playing games such as Dungeons & Dragons (D&D), composing prompts using generative AI requires a technical understanding to generate ideal results, which is difficult for novices. Thus, emergent narratives organically developed based on player actions and decisions have yet to be fully utilized. This paper envisions the use of generative AI in transforming storytelling into an interactive drama using dynamic and immersive narratives. First, we describe scenarios where narratives are created and character conversations are designed within an overarching fantasy disposition. Then, we recommend design guidelines to help create tools using generative AI in interactive storytelling. Lastly, we raise questions on its potential impact on player immersion and cognitive load. Our contributions may be expanded within the broader interactive storytelling domain, such as speech-conversational AI and persona-driven chatbots.
SportsPose -- A Dynamic 3D sports pose dataset
Authors: Christian Keilstrup Ingwersen, Christian Mikkelstrup, Janus Nørtoft Jensen, Morten Rieger Hannemose, Anders Bjorholm Dahl
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle joints in relation to the body. With this, we show that SportsPose contains more movement than the Human3.6M and 3DPW datasets in these extremum joints, indicating that our movements are more dynamic. The dataset with accompanying code can be downloaded from our website. We hope that SportsPose will allow researchers and practitioners to develop and evaluate more effective models for the analysis of sports performance and injury prevention. With its realistic and diverse dataset, SportsPose provides a valuable resource for advancing the state-of-the-art in pose estimation in sports.
Abstract
We study the problem of chasing positive bodies in $\ell1$: given a sequence of bodies $K{t}={x^{t}\in\mathbb{R}_{+}^{n}\mid C^{t}x^{t}\geq 1,P^{t}x^{t}\leq 1}$ revealed online, where $C^{t}$ and $P^{t}$ are nonnegative matrices, the goal is to (approximately) maintain a point $x_t \in K_t$ such that $\sum_t |xt - x{t-1}|_1$ is minimized. This captures the fully-dynamic low-recourse variant of any problem that can be expressed as a mixed packing-covering linear program and thus also the fractional version of many central problems in dynamic algorithms such as set cover, load balancing, hyperedge orientation, minimum spanning tree, and matching. We give an $O(\log d)$-competitive algorithm for this problem, where $d$ is the maximum row sparsity of any matrix $C^t$. This bypasses and improves exponentially over the lower bound of $\sqrt{n}$ known for general convex bodies. Our algorithm is based on iterated information projections, and, in contrast to general convex body chasing algorithms, is entirely memoryless. We also show how to round our solution dynamically to obtain the first fully dynamic algorithms with competitive recourse for all the stated problems above; i.e. their recourse is less than the recourse of every other algorithm on every update sequence, up to polylogarithmic factors. This is a significantly stronger notion than the notion of absolute recourse in the dynamic algorithms literature.
Abstract
As influencers play considerable roles in social media marketing, companies increase the budget for influencer marketing. Hiring effective influencers is crucial in social influencer marketing, but it is challenging to find the right influencers among hundreds of millions of social media users. In this paper, we propose InfluencerRank that ranks influencers by their effectiveness based on their posting behaviors and social relations over time. To represent the posting behaviors and social relations, the graph convolutional neural networks are applied to model influencers with heterogeneous networks during different historical periods. By learning the network structure with the embedded node features, InfluencerRank can derive informative representations for influencers at each period. An attentive recurrent neural network finally distinguishes highly effective influencers from other influencers by capturing the knowledge of the dynamics of influencer representations over time. Extensive experiments have been conducted on an Instagram dataset that consists of 18,397 influencers with their 2,952,075 posts published within 12 months. The experimental results demonstrate that InfluencerRank outperforms existing baseline methods. An in-depth analysis further reveals that all of our proposed features and model components are beneficial to discover effective influencers.
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Authors: Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
Authors: Omar Erak, Hatem Abou-Zeid
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
The recent advances in machine learning and deep neural networks have made them attractive candidates for wireless communications functions such as channel estimation, decoding, and downlink channel state information (CSI) compression. However, most of these neural networks are large and inefficient making it a barrier for deployment in practical wireless systems that require low-latency and low memory footprints for individual network functions. To mitigate these limitations, we propose accelerated and compressed efficient neural networks for massive MIMO CSI feedback. Specifically, we have thoroughly investigated the adoption of network pruning, post-training dynamic range quantization, and weight clustering to optimize CSI feedback compression for massive MIMO systems. Furthermore, we have deployed the proposed model compression techniques on commodity hardware and demonstrated that in order to achieve inference gains, specialized libraries that accelerate computations for sparse neural networks are required. Our findings indicate that there is remarkable value in applying these model compression techniques and the proposed joint pruning and quantization approach reduced model size by 86.5% and inference time by 76.2% with minimal impact to model accuracy. These compression methods are crucial to pave the way for practical adoption and deployments of deep learning-based techniques in commercial wireless systems.
The Rise of Disappearing Frameworks in Web Development
Authors: Juho Vepsäläinen, Arto Hellas, Petri Vuorimaa
Abstract
The evolution of the web can be characterized as an emergence of frameworks paving the way from static websites to dynamic web applications. As the scope of web applications has grown, new technical challenges have emerged, leading to the need for new solutions. The latest of these developments is the rise of so-called disappearing web frameworks that question the axioms of earlier generations of web frameworks, providing benefits of the early web and simple static sites.
Strong spatial mixing for colorings on trees and its algorithmic applications
Abstract
Strong spatial mixing (SSM) is an important quantitative notion of correlation decay for Gibbs distributions arising in statistical physics, probability theory, and theoretical computer science. A longstanding conjecture is that the uniform distribution on proper $q$-colorings on a $\Delta$-regular tree exhibits SSM whenever $q \ge \Delta+1$. Moreover, it is widely believed that as long as SSM holds on bounded-degree trees with $q$ colors, one would obtain an efficient sampler for $q$-colorings on all bounded-degree graphs via simple Markov chain algorithms. It is surprising that such a basic question is still open, even on trees, but then again it also highlights how much we still have to learn about random colorings. In this paper, we show the following: (1) For any $\Delta \ge 3$, SSM holds for random $q$-colorings on trees of maximum degree $\Delta$ whenever $q \ge \Delta + 3$. Thus we almost fully resolve the aforementioned conjecture. Our result substantially improves upon the previously best bound which requires $q \ge 1.59\Delta+\gamma^$ for an absolute constant $\gamma^ > 0$. (2) For any $\Delta\ge 3$ and girth $g = \Omega_\Delta(1)$, we establish optimal mixing of the Glauber dynamics for $q$-colorings on graphs of maximum degree $\Delta$ and girth $g$ whenever $q \ge \Delta+3$. Our approach is based on a new general reduction from spectral independence on large-girth graphs to SSM on trees that is of independent interest. Using the same techniques, we also prove near-optimal bounds on weak spatial mixing (WSM), a closely-related notion to SSM, for the antiferromagnetic Potts model on trees.
MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities
Authors: Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Text classification typically requires a substantial amount of human-annotated data to serve as supervision, which is costly to obtain in dynamic emerging domains. Certain methods seek to address this problem by solely relying on the surface text of class names to serve as extremely weak supervision. However, existing methods fail to account for single-class documents discussing multiple topics. Both topic diversity and vague sentences may introduce noise into the document's underlying representation and consequently the precision of the predicted class. Furthermore, current work focuses on text granularities (documents, sentences, or words) independently, which limits the degree of coarse- or fine-grained context that we can jointly extract from all three to identify significant subtext for classification. In order to address this problem, we propose MEGClass, an extremely weakly-supervised text classification method to exploit Mutually-Enhancing Text Granularities. Specifically, MEGClass constructs class-oriented sentence and class representations based on keywords for performing a sentence-level confidence-weighted label ensemble in order to estimate a document's initial class distribution. This serves as the target distribution for a multi-head attention network with a class-weighted contrastive loss. This network learns contextualized sentence representations and weights to form document representations that reflect its original document and sentence-level topic diversity. Retaining this heterogeneity allows MEGClass to select the most class-indicative documents to serve as iterative feedback for enhancing the class representations. Finally, these top documents are used to fine-tune a pre-trained text classifier. As demonstrated through extensive experiments on six benchmark datasets, MEGClass outperforms other weakly and extremely weakly supervised methods.
Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification
Authors: Jialin Liu, Ning Miao, Chongzhou Fang, Houman Homayoun, Han Wang
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
The Electrocardiogram (ECG) measures the electrical cardiac activity generated by the heart to detect abnormal heartbeat and heart attack. However, the irregular occurrence of the abnormalities demands continuous monitoring of heartbeats. Machine learning techniques are leveraged to automate the task to reduce labor work needed during monitoring. In recent years, many companies have launched products with ECG monitoring and irregular heartbeat alert. Among all classification algorithms, the time series-based algorithm dynamic time warping (DTW) is widely adopted to undertake the ECG classification task. Though progress has been achieved, the DTW-based ECG classification also brings a new attacking vector of leaking the patients' diagnosis results. This paper shows that the ECG input samples' labels can be stolen via a side-channel attack, Flush+Reload. In particular, we first identify the vulnerability of DTW for ECG classification, i.e., the correlation between warping path choice and prediction results. Then we implement an attack that leverages Flush+Reload to monitor the warping path selection with known ECG data and then build a predictor for constructing the relation between warping path selection and labels of input ECG samples. Based on experiments, we find that the Flush+Reload-based inference leakage can achieve an 84.0\% attacking success rate to identify the labels of the two samples in DTW.
MonoHuman: Animatable Human Neural Field from Monocular Video
Authors: Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, Kwan-Yee Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Animating virtual avatars with free-view control is crucial for various applications like virtual reality and digital entertainment. Previous studies have attempted to utilize the representation power of the neural radiance field (NeRF) to reconstruct the human body from monocular videos. Recent works propose to graft a deformation network into the NeRF to further model the dynamics of the human neural field for animating vivid human motions. However, such pipelines either rely on pose-dependent representations or fall short of motion coherency due to frame-independent optimization, making it difficult to generalize to unseen pose sequences realistically. In this paper, we propose a novel framework MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses. Our key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg keyframe information to reason the feature correlations for coherent results. Specifically, we first propose a Shared Bidirectional Deformation module, which creates a pose-independent generalizable deformation field by disentangling backward and forward deformation correspondences into shared skeletal motion weight and separate non-rigid motions. Then, we devise a Forward Correspondence Search module, which queries the correspondence feature of keyframes to guide the rendering network. The rendered results are thus multi-view consistent with high fidelity, even under challenging novel pose settings. Extensive experiments demonstrate the superiority of our proposed MonoHuman over state-of-the-art methods.
Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation
Authors: Paulo Padrao, Jose Fuentes, Tero Kaarlela, Alfredo Bayuelo, Leonardo Bobadilla
Abstract
Efficient and intuitive Human-Robot interfaces are crucial for expanding the user base of operators and enabling new applications in critical areas such as precision agriculture, automated construction, rehabilitation, and environmental monitoring. In this paper, we investigate the design of human-robot interfaces for the teleoperation of dynamical systems. The proposed framework seeks to find an optimal interface that complies with key concepts such as user comfort, efficiency, continuity, and consistency. As a proof-of-concept, we introduce an innovative approach to teleoperating underwater vehicles, allowing the translation between human body movements into vehicle control commands. This method eliminates the need for divers to work in harsh underwater environments while taking into account comfort and communication constraints. We conducted a study with human subjects using a head-mounted display attached to a smartphone to control a simulated ROV. Also, numerical experiments have demonstrated that the optimal translation is often the most intuitive and natural one, aligning with users' expectations.
Keyword: efficient
POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems
Optimizing Data Shapley Interaction Calculation from O(2^n) to O(t n^2) for KNN models
A greedy approach for increased vehicle utilization in ridesharing networks
SEENN: Towards Temporal Spiking Early-Exit Neural Networks
X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs
Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning
Towards Deterministic Communications in 6G Networks: State of the Art, Open Challenges and the Way Forward
Integrated Access and Backhaul via Satellites
PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research
Universal Framework for Parametric Constrained Coding
Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks
who did what to whom'' that are automatically extracted from text, is an important source of data for scholars of international politics. The high cost of developing new event datasets, especially using automated systems that rely on hand-built dictionaries, means that most researchers draw on large, pre-existing datasets such as ICEWS rather than developing tailor-made event datasets optimized for their specific research question. This paper describes a
bag of tricks'' for efficient, custom event data production, drawing on recent advances in natural language processing (NLP) that allow researchers to rapidly produce customized event datasets. The paper introduces techniques for training an event category classifier with active learning, identifying actors and the recipients of actions in text using large language models and standard machine learning classifiers and pretrained ``question-answering'' models from NLP, and resolving mentions of actors to their Wikipedia article to categorize them. We describe how these techniques produced the new POLECAT global event dataset that is intended to replace ICEWS, along with examples of how scholars can quickly produce smaller, custom event datasets. We publish example code and models to implement our new techniques.A Scale-Invariant Trajectory Simplification Method for Efficient Data Collection in Videos
Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
Adaptive Defective Area Identification in Material Surface Using Active Transfer Learning-based Level Set Estimation
An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees
Thematic context vector association based on event uncertainty for Twitter
Optimizing Irrigation Efficiency using Deep Reinforcement Learning in the Field
On the coordination efficiency of strategic multi-agent robotic teams
Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
Signal Temporal Logic Meets Convex-Concave Programming: A Structure-Exploiting SQP Algorithm for STL Specifications
Blockwise Compression of Transformer-based Models without Retraining
OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
How Regional Wind Characteristics Affect CNN-based wind predictions: Insights from Spatiotemporal Correlation Analysis
Meta-Learning with a Geometry-Adaptive Preconditioner
Information and Energy Transmission with Wavelet-Reconstructed Harvesting Functions
HALO: Hazard-Aware Landing Optimization for Autonomous Systems
MM-BSN: Self-Supervised Image Denoising for Real-World with Multi-Mask based on Blind-Spot Network
An interpretability framework for Similar case matching
On a family of low-rank algorithms for large-scale algebraic Riccati equations
Equivariant Networks for Porous Crystalline Materials
Moving Obstacle Collision Avoidance via Chance-Constrained MPC with CBF
Adaptive Image Compression via Optimal Mesh Refinement
Controller Synthesis for Local and Global Specifications in Multi-Agent Systems
High-performance Time Series Anomaly Discovery on Graphics Processors
Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines
Comparison of Two Search Criteria for Lattice-based Kernel Approximation
Towards Open-Vocabulary Video Instance Segmentation
Virtio-FPGA: a virtualization solution for SoC-attached FPGAs
Learning quantities of interest from parametric PDEs: An efficient neural-weighted Minimal Residual approach
Black Box Few-Shot Adaptation for Vision-Language models
Efficient Quotients Using Exact Arithmetic
Incorporating Unlabelled Data into Bayesian Neural Networks
Neural Field Convolutions by Repeated Differentiation
FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks
Incremental Verification of Neural Networks
Geometric Particle-In-Cell discretizations of a plasma hybrid model with kinetic ions and mass-less fluid electrons
Uncertainty Quantification for Recursive Estimation in Adaptive Safety-Critical Control
Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python
Inverting the SerDes Link Design Flow Process
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
Strong Baselines for Parameter Efficient Few-Shot Fine-tuning
High-Throughput Vector Similarity Search in Knowledge Graphs
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Scenario-Game ADMM: A Parallelized Scenario-Based Solver for Stochastic Noncooperative Games
Strong spatial mixing for colorings on trees and its algorithmic applications
DWA: Differential Wavelet Amplifier for Image Super-Resolution
Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation
Multi-Level Contrastive Learning for Dense Prediction Task
NPC: Neural Point Characters from Video
Keyword: faster
LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
OneShotSTL: One-Shot Seasonal-Trend Decomposition For Online Time Series Anomaly Detection And Forecasting
IterativePFN: True Iterative Point Cloud Filtering
Black Box Few-Shot Adaptation for Vision-Language models
Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation
Keyword: mobile
Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace
End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Energy-Saving Strategies for Mobile Web Apps and their Measurement: Results from a Decade of Research (Preprint)
Model Predictive Control for Multi-Agent Systems under Limited Communication and Time-Varying Network Topology
Keyword: pruning
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching
Attention Map Guided Transformer Pruning for Edge Device
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
Keyword: voxel
Unsupervised Brain Tumor Segmentation with Image-based Prompts
FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction
Keyword: lidar
LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation
USTC FLICAR: A Multisensor Fusion Dataset of LiDAR-Inertial-Camera for Heavy-duty Autonomous Aerial Work Robots
Keyword: diffusion
NeuroDAVIS: A neural network model for data visualization
Generative Diffusion Prior for Unified Image Restoration and Enhancement
The Interconnected Nature of Online Harm and Moderation: Investigating the Cross-Platform Spread of Harmful Content between YouTube and Twitter
Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models
A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion
Keyword: dynamic
SEENN: Towards Temporal Spiking Early-Exit Neural Networks
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
Non-Generative Energy Based Models
Lilac: a Modal Separation Logic for Conditional Probability
Accelerated parallel MRI using memory efficient and robust monotone operator learning (MOL)
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Lidar based 3D Tracking and State Estimation of Dynamic Objects
Learned Tree Search for Long-Horizon Social Robot Navigation in Shared Airspace
TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings
Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
DLRover: An Elastic Deep Training Extension with Auto Job Resource Recommendation
Multi model LSTM architecture for Track Association based on Automatic Identification System Data
Multimodal Neural Processes for Uncertainty Estimation
FisHook -- An Optimized Approach to Marine Specie Classification using MobileNetV2
Online Learning with Adversaries: A Differential Inclusion Analysis
Proving the Convergence to Limit Cycles using Periodically Decreasing Jacobian Matrix Measures
Decoupling Dynamic Monocular Videos for Dynamic View Synthesis
Virtio-FPGA: a virtualization solution for SoC-attached FPGAs
Adaptive parallelization of multi-agent simulations with localized dynamics
Dynamic treewidth
Operator splitting for port-Hamiltonian systems
Mixing predictions for online metric algorithms
Machine Learning Discovery of Optimal Quadrature Rules for Isogeometric Analysis
FAST: Fidelity-Adjustable Semantic Transmission over Heterogeneous Wireless Networks
Unified Behavioral Data-Driven Performance Analysis: A Generalized Plant Approach
Rolling the Dice: Imagining Generative AI as a Dungeons & Dragons Storytelling Companion
SportsPose -- A Dynamic 3D sports pose dataset
Chasing Positive Bodies
InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition
Accelerating and Compressing Deep Neural Networks for Massive MIMO CSI Feedback
The Rise of Disappearing Frameworks in Web Development
Strong spatial mixing for colorings on trees and its algorithmic applications
MEGClass: Text Classification with Extremely Weak Supervision via Mutually-Enhancing Text Granularities
Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification
MonoHuman: Animatable Human Neural Field from Monocular Video
Towards Optimal Human-Robot Interface Design Applied to Underwater Robotics Teleoperation