Abstract
While Feedforward Neural Networks (FNNs) have achieved remarkable success in various tasks, they are vulnerable to adversarial examples. Several techniques have been developed to verify the adversarial robustness of FNNs, but most of them focus on robustness verification against the local perturbation neighborhood of a single data point. There is still a large research gap in global robustness analysis. The global-robustness verifiable framework DeepGlobal has been proposed to identify \textit{all} possible Adversarial Dangerous Regions (ADRs) of FNNs, not limited to data samples in a test set. In this paper, we propose a complete specification and implementation of DeepGlobal utilizing the SMT solver Z3 for more explicit definition, and propose several improvements to DeepGlobal for more efficient verification. To evaluate the effectiveness of our implementation and improvements, we conduct extensive experiments on a set of benchmark datasets. Visualization of our experiment results shows the validity and effectiveness of the approach.
Abstract
We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose KOIOS, the first exact and efficient algorithm for semantic overlap search. KOIOS leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.
B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding
Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit
Abstract
Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2022) for robust and model-agnostic learning of distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data demonstrate how the method might be used in practice.
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Abstract
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly generating websites from handwritten text and identifying humorous elements within images. These features are rarely observed in previous vision-language models. We believe the primary reason for GPT-4's advanced multi-modal generation capabilities lies in the utilization of a more advanced large language model (LLM). To examine this phenomenon, we present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer. Our findings reveal that MiniGPT-4 possesses many capabilities similar to those exhibited by GPT-4 like detailed image description generation and website creation from hand-written drafts. Furthermore, we also observe other emerging capabilities in MiniGPT-4, including writing stories and poems inspired by given images, providing solutions to problems shown in images, teaching users how to cook based on food photos, etc. In our experiment, we found that only performing the pretraining on raw image-text pairs could produce unnatural language outputs that lack coherency including repetition and fragmented sentences. To address this problem, we curate a high-quality, well-aligned dataset in the second stage to finetune our model using a conversational template. This step proved crucial for augmenting the model's generation reliability and overall usability. Notably, our model is highly computationally efficient, as we only train a projection layer utilizing approximately 5 million aligned image-text pairs. Our code, pre-trained model, and collected dataset are available at https://minigpt-4.github.io/.
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Abstract
The increasing demand for privacy and security has driven the advancement of private inference (PI), a cryptographic method enabling inferences directly on encrypted data. However, the computational and storage burdens of non-linear operators (e.g., ReLUs) render it impractical. Despite these limitations, prior ReLU optimization methods consistently relied on classical networks, that are not optimized for PI. Moreover, the selection of baseline networks in these ReLU optimization methods remains enigmatic and fails to provide insights into network attributes contributing to PI efficiency. In this paper, we investigate the desirable network architecture for efficient PI, and {\em key finding} is wider networks are superior at higher ReLU counts, while networks with a greater proportion of least-critical ReLUs excel at lower ReLU counts. Leveraging these findings, we develop a novel network redesign technique (DeepReShape) with a complexity of $\mathcal{O}(1)$, and synthesize specialized architectures(HybReNet). Compared to the state-of-the-art (SNL on CIFAR-100), we achieve a 2.35\% accuracy gain at 180K ReLUs, and for ResNet50 on TinyImageNet our method saves 4.2$\times$ ReLUs at iso-accuracy.
Enhancing Artificial intelligence Policies with Fusion and Forecasting: Insights from Indian Patents Using Network Analysis
Abstract
This paper presents a study of the interconnectivity and interdependence of various Artificial intelligence (AI) technologies through the use of centrality measures, clustering coefficients, and degree of fusion measures. By analyzing the technologies through different time windows and quantifying their importance, we have revealed important insights into the crucial components shaping the AI landscape and the maturity level of the domain. The results of this study have significant implications for future development and advancements in artificial intelligence and provide a clear understanding of key technology areas of fusion. Furthermore, this paper contributes to AI public policy research by offering a data-driven perspective on the current state and future direction of the field. However, it is important to acknowledge the limitations of this research and call for further studies to build on these results. With these findings, we hope to inform and guide future research in the field of AI, contributing to its continued growth and success.
ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks
Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John
Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
Abstract
The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $\mu$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $\mu$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.
Abstract
In an increasingly digitized world, the secure management and trade of digital assets have become a pressing issue. This project aims to address this challenge by developing a decentralized application (dApp) that leverages blockchain technology and deep learning models to provide secure and efficient digital asset management, with a focus on NFTs. The dApp includes features such as secure wallet connections, NFT image generation, minting, marketplace, and profile management. The back-end of the dApp is implemented using the Goerli testnet with Solidity-based smart contracts, while IPFS and ReactJS/EtherJS are used for decentralized storage and front-end development, respectively. Additionally, the OpenAI API is integrated to generate unique NFT images based on user input. The project demonstrates the practical application of blockchain technology and deep learning models in developing dApps for secure and decentralized digital asset management. Overall, the project contributes to the ongoing research on blockchain-based solutions for secure digital asset management, while highlighting the potential of blockchain and deep learning technologies to transform the way we manage and trade digital assets.
Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning
Abstract
Federated Learning (FL) enables collaborative deep learning training across multiple participants without exposing sensitive personal data. However, the distributed nature of FL and the unvetted participants' data makes it vulnerable to backdoor attacks. In these attacks, adversaries inject malicious functionality into the centralized model during training, leading to intentional misclassifications for specific adversary-chosen inputs. While previous research has demonstrated successful injections of persistent backdoors in FL, the persistence also poses a challenge, as their existence in the centralized model can prompt the central aggregation server to take preventive measures to penalize the adversaries. Therefore, this paper proposes a methodology that enables adversaries to effectively remove backdoors from the centralized model upon achieving their objectives or upon suspicion of possible detection. The proposed approach extends the concept of machine unlearning and presents strategies to preserve the performance of the centralized model and simultaneously prevent over-unlearning of information unrelated to backdoor patterns, making the adversaries stealthy while removing backdoors. To the best of our knowledge, this is the first work that explores machine unlearning in FL to remove backdoors to the benefit of adversaries. Exhaustive evaluation considering image classification scenarios demonstrates the efficacy of the proposed method in efficient backdoor removal from the centralized model, injected by state-of-the-art attacks across multiple configurations.
On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers
Authors: Boris Velasevic, Rohit Parasnis, Christopher G. Brinton, Navid Azizan
Abstract
We consider the fundamental problem of solving a large-scale system of linear equations. In particular, we consider the setting where a taskmaster intends to solve the system in a distributed/federated fashion with the help of a set of machines, who each have a subset of the equations. Although there exist several approaches for solving this problem, missing is a rigorous comparison between the convergence rates of the projection-based methods and those of the optimization-based ones. In this paper, we analyze and compare these two classes of algorithms with a particular focus on the most efficient method from each class, namely, the recently proposed Accelerated Projection-Based Consensus (APC) and the Distributed Heavy-Ball Method (D-HBM). To this end, we first propose a geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we bound and compare the convergence rates of the studied algorithms and capture the effects of both cross-machine and local data heterogeneity on these quantities. Our analysis results in a number of novel insights besides showing that APC is the most efficient method in realistic scenarios where there is a large data heterogeneity. Our numerical analyses validate our theoretical results.
Word Sense Induction with Knowledge Distillation from BERT
Abstract
Pre-trained contextual language models are ubiquitously employed for language understanding tasks, but are unsuitable for resource-constrained systems. Noncontextual word embeddings are an efficient alternative in these settings. Such methods typically use one vector to encode multiple different meanings of a word, and incur errors due to polysemy. This paper proposes a two-stage method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context and transferring this sense information to fit multi-sense embeddings in a skip-gram-like framework. We demonstrate an effective approach to training the sense disambiguation mechanism in our model with a distribution over word senses extracted from the output layer embeddings of BERT. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings on multiple benchmark data sets, and experiments with an embedding-based topic model (ETM) demonstrates the benefits of using this multi-sense embedding in a downstream application.
Modular Hardware Design with Timeline Types
Authors: Rachit Nigam, Pedro Henrique Azevedo De Amorim, Adrian Sampson
Subjects: Hardware Architecture (cs.AR); Programming Languages (cs.PL)
Abstract
Modular design is a key challenge for enabling large-scale reuse of hardware modules. Unlike software, however, hardware designs correspond to physical circuits and inherit constraints from them. Timing constraints -- which cycle a signal arrives, when an input is read -- and structural constraints -- how often a multiplier accepts new inputs -- are fundamental to hardware interfaces. Existing hardware design languages do not provide a way to encode these constraints; a user must read documentation, build scripts, or in the worst case, a module's implementation to understand how to use it. We present Filament, a language for modular hardware design that supports the specification and enforcement of timing and structural constraints for statically scheduled pipelines. Filament uses timeline types, which describe the intervals of clock-cycle time when a given signal is available or required. Filament enables safe composition of hardware modules, ensures that the resulting designs are correctly pipelined, and predictably lowers them to efficient hardware.
Feature point detection in HDR images based on coefficient of variation
Authors: Artur Santos Nascimento, Welerson Augusto Lino de Jesus Melo, Daniel Oliveira Dantas, Beatriz Trinchão Andrade
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.
SLEPLET: Slepian Scale-Discretised Wavelets in Python
Authors: Patrick J. Roddy
Subjects: Information Theory (cs.IT); Instrumentation and Methods for Astrophysics (astro-ph.IM); Numerical Analysis (math.NA)
Abstract
Wavelets are widely used in various disciplines to analyse signals both in space and scale. Whilst many fields measure data on manifolds (i.e., the sphere), often data are only observed on a partial region of the manifold. Wavelets are a typical approach to data of this form, but the wavelet coefficients that overlap with the boundary become contaminated and must be removed for accurate analysis. Another approach is to estimate the region of missing data and to use existing whole-manifold methods for analysis. However, both approaches introduce uncertainty into any analysis. Slepian wavelets enable one to work directly with only the data present, thus avoiding the problems discussed above. Applications of Slepian wavelets to areas of research measuring data on the partial sphere include gravitational/magnetic fields in geodesy, ground-based measurements in astronomy, measurements of whole-planet properties in planetary science, geomagnetism of the Earth, and cosmic microwave background analyses.
Matching-based Data Valuation for Generative Model
Abstract
Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.
EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction
Authors: Zhen Tian, Ting Bai, Wayne Xin Zhao, Ji-Rong Wen, Zhao Cao
Abstract
Learning effective high-order feature interactions is very crucial in the CTR prediction task. However, it is very time-consuming to calculate high-order feature interactions with massive features in online e-commerce platforms. Most existing methods manually design a maximal order and further filter out the useless interactions from them. Although they reduce the high computational costs caused by the exponential growth of high-order feature combinations, they still suffer from the degradation of model capability due to the suboptimal learning of the restricted feature orders. The solution to maintain the model capability and meanwhile keep it efficient is a technical challenge, which has not been adequately addressed. To address this issue, we propose an adaptive feature interaction learning model, named as EulerNet, in which the feature interactions are learned in a complex vector space by conducting space mapping according to Euler's formula. EulerNet converts the exponential powers of feature interactions into simple linear combinations of the modulus and phase of the complex features, making it possible to adaptively learn the high-order feature interactions in an efficient way. Furthermore, EulerNet incorporates the implicit and explicit feature interactions into a unified architecture, which achieves the mutual enhancement and largely boosts the model capabilities. Such a network can be fully learned from data, with no need of pre-designed form or order for feature interactions. Extensive experiments conducted on three public datasets have demonstrated the effectiveness and efficiency of our approach. Our code is available at: https://github.com/RUCAIBox/EulerNet.
Deep Learning-empowered Predictive Precoder Design for OTFS Transmission in URLLC
Abstract
To guarantee excellent reliability performance in ultra-reliable low-latency communications (URLLC), pragmatic precoder design is an effective approach. However, an efficient precoder design highly depends on the accurate instantaneous channel state information at the transmitter (ICSIT), which however, is not always available in practice. To overcome this problem, in this paper, we focus on the orthogonal time frequency space (OTFS)-based URLLC system and adopt a deep learning (DL) approach to directly predict the precoder for the next time frame to minimize the frame error rate (FER) via implicitly exploiting the features from estimated historical channels in the delay-Doppler domain. By doing this, we can guarantee the system reliability even without the knowledge of ICSIT. To this end, a general precoder design problem is formulated where a closed-form theoretical FER expression is specifically derived to characterize the system reliability. Then, a delay-Doppler domain channels-aware convolutional long short-term memory (CLSTM) network (DDCL-Net) is proposed for predictive precoder design. In particular, both the convolutional neural network and LSTM modules are adopted in the proposed neural network to exploit the spatial-temporal features of wireless channels for improving the learning performance. Finally, simulation results demonstrated that the FER performance of the proposed method approaches that of the perfect ICSI-aided scheme.
Energy management system for biological 3D printing by the refinement of manifold model morphing in flexible grasping space
Abstract
The use of 3D printing, or additive manufacturing, has gained significant attention in recent years due to its potential for revolutionizing traditional manufacturing processes. One key challenge in 3D printing is managing energy consumption, as it directly impacts the cost, efficiency, and sustainability of the process. In this paper, we propose an energy management system that leverages the refinement of manifold model morphing in a flexible grasping space, to reduce costs for biological 3D printing. The manifold model is a mathematical representation of the 3D object to be printed, and the refinement process involves optimizing the morphing parameters of the manifold model to achieve desired printing outcomes. To enable flexibility in the grasping space, we incorporate data-driven approaches, such as machine learning and data augmentation techniques, to enhance the accuracy and robustness of the energy management system. Our proposed system addresses the challenges of limited sample data and complex morphologies of manifold models in layered additive manufacturing. Our method is more applicable for soft robotics and biomechanisms. We evaluate the performance of our system through extensive experiments and demonstrate its effectiveness in predicting and managing energy consumption in 3D printing processes. The results highlight the importance of refining manifold model morphing in the flexible grasping space for achieving energy-efficient 3D printing, contributing to the advancement of green and sustainable manufacturing practices.
Linear building pattern recognition via spatial knowledge graph
Authors: Wei Zhiwei, Xiao Yi, Tong Ying, Xu Wenjia, Wang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Building patterns are important urban structures that reflect the effect of the urban material and social-economic on a region. Previous researches are mostly based on the graph isomorphism method and use rules to recognize building patterns, which are not efficient. The knowledge graph uses the graph to model the relationship between entities, and specific subgraph patterns can be efficiently obtained by using relevant reasoning tools. Thus, we try to apply the knowledge graph to recognize linear building patterns. First, we use the property graph to express the spatial relations in proximity, similar and linear arrangement between buildings; secondly, the rules of linear pattern recognition are expressed as the rules of knowledge graph reasoning; finally, the linear building patterns are recognized by using the rule-based reasoning in the built knowledge graph. The experimental results on a dataset containing 1289 buildings show that the method in this paper can achieve the same precision and recall as the existing methods; meanwhile, the recognition efficiency is improved by 5.98 times.
Multi-scale Evolutionary Neural Architecture Search for Deep Spiking Neural Networks
Abstract
Spiking Neural Networks (SNNs) have received considerable attention not only for their superiority in energy efficient with discrete signal processing, but also for their natural suitability to integrate multi-scale biological plasticity. However, most SNNs directly adopt the structure of the well-established DNN, rarely automatically design Neural Architecture Search (NAS) for SNNs. The neural motifs topology, modular regional structure and global cross-brain region connection of the human brain are the product of natural evolution and can serve as a perfect reference for designing brain-inspired SNN architecture. In this paper, we propose a Multi-Scale Evolutionary Neural Architecture Search (MSE-NAS) for SNN, simultaneously considering micro-, meso- and macro-scale brain topologies as the evolutionary search space. MSE-NAS evolves individual neuron operation, self-organized integration of multiple circuit motifs, and global connectivity across motifs through a brain-inspired indirect evaluation function, Representational Dissimilarity Matrices (RDMs). This training-free fitness function could greatly reduce computational consumption and NAS's time, and its task-independent property enables the searched SNNs to exhibit excellent transferbility and scalability. Extensive experiments demonstrate that the proposed algorithm achieves state-of-the-art (SOTA) performance with shorter simulation steps on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS and DVS128-Gesture). The thorough analysis also illustrates the significant performance improvement and consistent bio-interpretability deriving from the topological evolution at different scales and the RDMs fitness function.
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback
Authors: Nikhil Mehta, Milagro Teruel, Patricio Figueroa Sanz, Xin Deng, Ahmed Hassan Awadallah, Julia Kiseleva
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Many approaches to Natural Language Processing (NLP) tasks often treat them as single-step problems, where an agent receives an instruction, executes it, and is evaluated based on the final outcome. However, human language is inherently interactive, as evidenced by the back-and-forth nature of human conversations. In light of this, we posit that human-AI collaboration should also be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize. Further, the AI agent should be able to detect when it needs additional information and proactively ask for help. Enabling this scenario would lead to more natural, efficient, and engaging human-AI collaborations. In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world. We explore multiple types of help players can give to the AI to guide it and analyze the impact of this help in AI behavior, resulting in performance improvements.
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Authors: Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L
Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation
Abstract
Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to the sound source with egocentric visual and audio observations. However, the existing methods are limited in two aspects: 1) poor generalization to unheard sound categories; 2) sample inefficient in training. Focusing on these two problems, we propose a brain-inspired plug-and-play method to learn a semantic-agnostic and spatial-aware representation for generalizable visual-audio navigation. We meticulously design two auxiliary tasks for respectively accelerating learning representations with the above-desired characteristics. With these two auxiliary tasks, the agent learns a spatially-correlated representation of visual and audio inputs that can be applied to work on environments with novel sounds and maps. Experiment results on realistic 3D scenes (Replica and Matterport3D) demonstrate that our method achieves better generalization performance when zero-shot transferred to scenes with unseen maps and unheard sound categories.
Learn to Cluster Faces with Better Subgraphs
Authors: Yuan Cao, Di Jiang, Guanqun Hou, Fan Deng, Xinjia Chen, Qiang Yang
Abstract
Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise and improve the recall of the subgraphs, and hence can drive the distant nodes to converge towards the same centers. More specifically, the proposed method consists of two components, i.e. face embeddings enhancement using the embeddings from neighbors, and enclosed subgraph construction of node pairs for structural information extraction. The embeddings are combined to predict the linkage probabilities for all node pairs to replace the cosine similarities to produce new subgraphs that can be further used for aggregation of GCNs or other clustering methods. The proposed method is validated through extensive experiments against a range of clustering solutions using three benchmark datasets and numerical results confirm that it outperforms the SOTA solutions in terms of generalization capability.
A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs
Authors: Matteo Caldana, Paola F. Antonietti, Luca Dede'
Abstract
Algebraic multigrid (AMG) methods are among the most efficient solvers for linear systems of equations and they are widely used for the solution of problems stemming from the discretization of Partial Differential Equations (PDEs). The most severe limitation of AMG methods is the dependence on parameters that require to be fine-tuned. In particular, the strong threshold parameter is the most relevant since it stands at the basis of the construction of successively coarser grids needed by the AMG methods. We introduce a novel Deep Learning algorithm that minimizes the computational cost of the AMG method when used as a finite element solver. We show that our algorithm requires minimal changes to any existing code. The proposed Artificial Neural Network (ANN) tunes the value of the strong threshold parameter by interpreting the sparse matrix of the linear system as a black-and-white image and exploiting a pooling operator to transform it into a small multi-channel image. We experimentally prove that the pooling successfully reduces the computational cost of processing a large sparse matrix and preserves the features needed for the regression task at hand. We train the proposed algorithm on a large dataset containing problems with a highly heterogeneous diffusion coefficient defined in different three-dimensional geometries and discretized with unstructured grids and linear elasticity problems with a highly heterogeneous Young's modulus. When tested on problems with coefficients or geometries not present in the training dataset, our approach reduces the computational time by up to 30%.
An Analytical Model for Performance Estimation in High-Capacity IMDD Systems
Authors: Giuseppe Rizzelli, Pablo Torres-Ferrera, Fabrizio Forghieri, Roberto Gaudino
Abstract
In this paper, we propose an analytical model to estimate the signal-to-noise ratio (SNR) at the output of an adaptive equalizer in intensity modulation and direct detection (IMDD) optical transmission systems affected by shot noise, thermal noise, relative intensity noise (RIN), chromatic dispersion (CD) and bandwidth limitations. We develop the model as an extension of a previously presented one, and then we test its accuracy by sweeping the main parameters of a 4-PAM-based communication system such as RIN coefficient, extinction ratio, CD coefficient and equalizer memory. Our findings show a remarkable agreement between time-domain simulations and analytical results, with SNR discrepancies below 0.1 dB in most cases, for both feed-forward and decision-feedback equalization. We consider that the proposed model is a powerful tool for the numerical design of strongly band-limited IMDD systems using receiver equalization, as it happens in most of modern and future M-PAM solutions for short reach and access systems.
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
Abstract
We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.
Better Sign Language Translation with Monolingual Data
Abstract
Sign language translation (SLT) systems, which are often decomposed into video-to-gloss (V2G) recognition and gloss-to-text (G2T) translation through the pivot gloss, heavily relies on the availability of large-scale parallel G2T pairs. However, the manual annotation of pivot gloss, which is a sequence of transcribed written-language words in the order in which they are signed, further exacerbates the scarcity of data for SLT. To address this issue, this paper proposes a simple and efficient rule transformation method to transcribe the large-scale target monolingual data into its pseudo glosses automatically for enhancing the SLT translation. Empirical results show that the proposed approach can significantly improve the performance of SLT, especially achieving state-of-the-art results on two SLT benchmark datasets PHEONIX-WEATHER 2014T and ASLG-PC12. Our code has been released at: https://github.com/pengr/Mono\_SLT.
How Well Does the Metropolis Algorithm Cope With Local Optima?
Authors: Benjamin Doerr, Taha El Ghazi El Houssaini, Amirhossein Rajabi, Carsten Wit
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract
The Metropolis algorithm (MA) is a classic stochastic local search heuristic. It avoids getting stuck in local optima by occasionally accepting inferior solutions. To better and in a rigorous manner understand this ability, we conduct a mathematical runtime analysis of the MA on the CLIFF benchmark. Apart from one local optimum, cliff functions are monotonically increasing towards the global optimum. Consequently, to optimize a cliff function, the MA only once needs to accept an inferior solution. Despite seemingly being an ideal benchmark for the MA to profit from its main working principle, our mathematical runtime analysis shows that this hope does not come true. Even with the optimal temperature (the only parameter of the MA), the MA optimizes most cliff functions less efficiently than simple elitist evolutionary algorithms (EAs), which can only leave the local optimum by generating a superior solution possibly far away. This result suggests that our understanding of why the MA is often very successful in practice is not yet complete. Our work also suggests to equip the MA with global mutation operators, an idea supported by our preliminary experiments.
Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation
Authors: Christos P. Lamprakos, Sotirios Xydis, Francky Catthoor, Dimitrios Soudris
Abstract
This paper presents a trace-based simulation methodology for constructing representations of workload-allocator interaction. We use two-dimensional rectangular bin packing (2DBP) as our foundation. Classical 2DBP algorithms minimize their products' makespan, but virtual memory systems employing demand paging deem such a criterion inappropriate. We view an allocator's placement decisions as a solution to a 2DBP instance, optimizing some unknown criterion particular to that allocator's policy. Our end product is a compact data structure that fits e.g. the simulation of 80 million requests in a 350 MiB file. By design, it is concerned with events residing entirely in virtual memory; no information on memory accesses, indexing costs or any other factor is kept. We bootstrap our contribution's significance by exploring its relationship to maximum resident set size (RSS). Our baseline is the assumption that less fragmentation amounts to smaller peak RSS. We thus define a fragmentation metric in the 2DBP substrate and compute it for 28 workloads linked to 4 modern allocators. We also measure peak RSS for the 112 resulting pairs. Our metric exhibits a strong monotonic relationship (Spearman coefficient $\rho>0.65$) in half of those cases: allocators achieving better 2DBP placements yield $9\%$-$30\%$ smaller peak RSS, with the trends remaining consistent across two different machines. Considering our representation's minimalism, the presented empirical evidence is a robust indicator of its potency. If workload-allocator interplay in the virtual address space suffices to evaluate a novel fragmentation definition, numerous other useful applications of our tool can be studied. Both augmenting 2DBP and exploring alternative computations on it provide ample fertile ground for future research.
Med-Tuning: Exploring Parameter-Efficient Transfer Learning for Medical Volumetric Segmentation
Authors: Wenxuan Wang, Jiachen Shen, Chen Chen, Jianbo Jiao, Yan Zhang, Shanshan Song, Jiangyun Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning based medical volumetric segmentation methods either train the model from scratch or follow the standard "pre-training then finetuning" paradigm. Although finetuning a well pre-trained model on downstream tasks can harness its representation power, the standard full finetuning is costly in terms of computation and memory footprint. In this paper, we present the first study on parameter-efficient transfer learning for medical volumetric segmentation and propose a novel framework named Med-Tuning based on intra-stage feature enhancement and inter-stage feature interaction. Given a large-scale pre-trained model on 2D natural images, our method can exploit both the multi-scale spatial feature representations and temporal correlations along image slices, which are crucial for accurate medical volumetric segmentation. Extensive experiments on three benchmark datasets (including CT and MRI) show that our method can achieve better results than previous state-of-the-art parameter-efficient transfer learning methods and full finetuning for the segmentation task, with much less tuned parameter costs. Compared to full finetuning, our method reduces the finetuned model parameters by up to 4x, with even better segmentation performance.
GCNH: A Simple Method For Representation Learning On Heterophilous Graphs
Abstract
Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs, i.e., graphs in which edges tend to connect nodes of the same type. Yet, achievement of consistent GNN performance on heterophilous graphs remains an open research problem. Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs, trading off model simplicity for prediction accuracy. However, these models fail to capture basic graph properties, such as neighborhood label distribution, which are fundamental for learning. In this work, we propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios. GCNH learns and combines separate representations for a node and its neighbors, using one learned importance coefficient per layer to balance the contributions of center nodes and neighborhoods. We conduct extensive experiments on eight real-world graphs and a set of synthetic graphs with varying degrees of heterophily to demonstrate how the design choices for GCNH lead to a sizable improvement over a vanilla GCN. Moreover, GCNH outperforms state-of-the-art models of much higher complexity on four out of eight benchmarks, while producing comparable results on the remaining datasets. Finally, we discuss and analyze the lower complexity of GCNH, which results in fewer trainable parameters and faster training times than other methods, and show how GCNH mitigates the oversmoothing problem.
Factored Neural Representation for Scene Understanding
Authors: Yu-Shiang Wong, Niloy J. Mitra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
A long-standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB-D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end-to-end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB-D video to produce object-level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). The project webpage is available at: $\href{https://yushiangw.github.io/factorednerf/}{\text{link}}$.
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
Abstract
We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an $\epsilon$-SOSP is $O(\epsilon^{-3.5})$, which is an improvement over the state-of-the-art sample complexity of $O(\epsilon^{-4.5})$.
Online Time-Optimal Trajectory Planning on Three-Dimensional Race Tracks
Authors: Matthias Rowold, Levent Ögretmen, Ulf Kasolowsky, Boris Lohmann
Abstract
We propose an online planning approach for racing that generates the time-optimal trajectory for the upcoming track section. The resulting trajectory takes the current vehicle state, effects caused by \acl{3D} track geometries, and speed limits dictated by the race rules into account. In each planning step, an optimal control problem is solved, making a quasi-steady-state assumption with a point mass model constrained by gg-diagrams. For its online applicability, we propose an efficient representation of the gg-diagrams and identify negligible terms to reduce the computational effort. We demonstrate that the online planning approach can reproduce the lap times of an offline-generated racing line during single vehicle racing. Moreover, it finds a new time-optimal solution when a deviation from the original racing line is necessary, e.g., during an overtaking maneuver. Motivated by the application in a rule-based race, we also consider the scenario of a speed limit lower than the current vehicle velocity. We introduce an initializable slack variable to generate feasible trajectories despite the constraint violation while reducing the velocity to comply with the rules.
IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty
Abstract
In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. We solve the deterministic planning problem using sampling-based methods such as PRM or RRG to construct a graph of nominal trajectories. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching over the graph using the proposed heuristic. IBBT interleaves between batch state sampling, nominal trajectory graph construction, heuristic computing, and search over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds motion plans that converge to the optimal one. IBBT is efficient by reusing results between sequential iterations. The belief tree searching is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.
RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments
Abstract
Current simultaneous localization and mapping (SLAM) algorithms perform well in static environments but easily fail in dynamic environments. Recent works introduce deep learning-based semantic information to SLAM systems to reduce the influence of dynamic objects. However, it is still challenging to apply a robust localization in dynamic environments for resource-restricted robots. This paper proposes a real-time RGB-D inertial odometry system for resource-restricted robots in dynamic environments named Dynamic-VINS. Three main threads run in parallel: object detection, feature tracking, and state optimization. The proposed Dynamic-VINS combines object detection and depth information for dynamic feature recognition and achieves performance comparable to semantic segmentation. Dynamic-VINS adopts grid-based feature detection and proposes a fast and efficient method to extract high-quality FAST feature points. IMU is applied to predict motion for feature tracking and moving consistency check. The proposed method is evaluated on both public datasets and real-world applications and shows competitive localization accuracy and robustness in dynamic environments. Yet, to the best of our knowledge, it is the best-performance real-time RGB-D inertial odometry for resource-restricted platforms in dynamic environments for now. The proposed system is open source at: https://github.com/HITSZ-NRSL/Dynamic-VINS.git
Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation
Authors: Iris Andrussow, Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius
Abstract
Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human-robot interaction.
Knowledge Distillation Under Ideal Joint Classifier Assumption
Authors: Huayu Li, Xiwen Chen, Gregory Ditzler, Ping Chang, Janet Roveda, Ao Li
Abstract
Knowledge distillation is a powerful technique to compress large neural networks into smaller, more efficient networks. Softmax regression representation learning is a popular approach that uses a pre-trained teacher network to guide the learning of a smaller student network. While several studies explored the effectiveness of softmax regression representation learning, the underlying mechanism that provides knowledge transfer is not well understood. This paper presents Ideal Joint Classifier Knowledge Distillation (IJCKD), a unified framework that provides a clear and comprehensive understanding of the existing knowledge distillation methods and a theoretical foundation for future research. Using mathematical techniques derived from a theory of domain adaptation, we provide a detailed analysis of the student network's error bound as a function of the teacher. Our framework enables efficient knowledge transfer between teacher and student networks and can be applied to various applications.
Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT
Authors: Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du
Abstract
Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
Authors: Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
Abstract
We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss. To pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs. It employed text dropout as a data augmentation technique and bar patching to efficiently represent music data which reduces sequence length to less than 10%. In addition, we developed a masked music model pre-training objective to enhance the music encoder's comprehension of musical context and structure. CLaMP integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. To support the evaluation of semantic search and music classification, we publicly release WikiMusicText (WikiMT), a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description. In comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP demonstrated comparable or superior performance on score-oriented datasets.
Backpropagation-free Training of Deep Physical Neural Networks
Authors: Ali Momeni, Babak Rahmani, Matthieu Mallejac, Philipp Del Hougne, Romain Fleury
Abstract
Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.
SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model
Authors: Nan Li, Bo Kang, Tijl De Bie
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need supervision or rely on heavy data-preprocessing and feature engineering. Directly prompting the latest conversational LLM for standard skills, however, is slow, costly and inaccurate. In contrast, SkillGPT utilizes a LLM to perform its tasks in steps via summarization and vector similarity search, to balance speed with precision. The backbone LLM of SkillGPT is based on Llama, free for academic use and thus useful for exploratory research and prototype development. Hence, our cost-free SkillGPT gives users the convenience of conversational SES, efficiently and reliably.
HeRo: RoBERTa and Longformer Hebrew Language Models
Authors: Vitaly Shalumov, Harel Haskey
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
In this paper, we fill in an existing gap in resources available to the Hebrew NLP community by providing it with the largest so far pre-train dataset HeDC4, a state-of-the-art pre-trained language model HeRo for standard length inputs and an efficient transformer LongHeRo for long input sequences. The HeRo model was evaluated on the sentiment analysis, the named entity recognition, and the question answering tasks while the LongHeRo model was evaluated on the document classification task with a dataset composed of long documents. Both HeRo and LongHeRo presented state-of-the-art performance. The dataset and model checkpoints used in this work are publicly available.
A Convolutional Spiking Network for Gesture Recognition in Brain-Computer Interfaces
Authors: Yiming Ai, Bipin Rajendran
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
Brain-computer interfaces are being explored for a wide variety of therapeutic applications. Typically, this involves measuring and analyzing continuous-time electrical brain activity via techniques such as electrocorticogram (ECoG) or electroencephalography (EEG) to drive external devices. However, due to the inherent noise and variability in the measurements, the analysis of these signals is challenging and requires offline processing with significant computational resources. In this paper, we propose a simple yet efficient machine learning-based approach for the exemplary problem of hand gesture classification based on brain signals. We use a hybrid machine learning approach that uses a convolutional spiking neural network employing a bio-inspired event-driven synaptic plasticity rule for unsupervised feature learning of the measured analog signals encoded in the spike domain. We demonstrate that this approach generalizes to different subjects with both EEG and ECoG data and achieves superior accuracy in the range of 92.74-97.07% in identifying different hand gesture classes and motor imagery tasks.
Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer
Authors: Yuzhen Ding, Hongying Feng, Yunze Yang, Jason Holmes, Zhengliang Liu, David Liu, William W. Wong, Nathan Y. Yu, Terence T. Sio, Steven E. Schild, Baoxin Li, Wei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Abstract
Purpose: In some proton therapy facilities, patient alignment relies on two 2D orthogonal kV images, taken at fixed, oblique angles, as no 3D on-the-bed imaging is available. The visibility of the tumor in kV images is limited since the patient's 3D anatomy is projected onto a 2D plane, especially when the tumor is behind high-density structures such as bones. This can lead to large patient setup errors. A solution is to reconstruct the 3D CT image from the kV images obtained at the treatment isocenter in the treatment position. Methods: An asymmetric autoencoder-like network built with vision-transformer blocks was developed. The data was collected from 1 head and neck patient: 2 orthogonal kV images (1024x1024 voxels), 1 3D CT with padding (512x512x512) acquired from the in-room CT-on-rails before kVs were taken and 2 digitally-reconstructed-radiograph (DRR) images (512x512) based on the CT. We resampled kV images every 8 voxels and DRR and CT every 4 voxels, thus formed a dataset consisting of 262,144 samples, in which the images have a dimension of 128 for each direction. In training, both kV and DRR images were utilized, and the encoder was encouraged to learn the jointed feature map from both kV and DRR images. In testing, only independent kV images were used. The full-size synthetic CT (sCT) was achieved by concatenating the sCTs generated by the model according to their spatial information. The image quality of the synthetic CT (sCT) was evaluated using mean absolute error (MAE) and per-voxel-absolute-CT-number-difference volume histogram (CDVH). Results: The model achieved a speed of 2.1s and a MAE of <40HU. The CDVH showed that <5% of the voxels had a per-voxel-absolute-CT-number-difference larger than 185 HU. Conclusion: A patient-specific vision-transformer-based network was developed and shown to be accurate and efficient to reconstruct 3D CT images from kV images.
Abstract
We introduce Deep Learning Vulnerability Analyzer (DLVA), a vulnerability detection tool for Ethereum smart contracts based on powerful deep learning techniques for sequential data adapted for bytecode. We train DLVA to judge bytecode even though the supervising oracle, Slither, can only judge source code. DLVA's training algorithm is general: we "extend" a source code analysis to bytecode without any manual feature engineering, predefined patterns, or expert rules. DLVA's training algorithm is also robust: it overcame a 1.25% error rate mislabeled contracts, and the student surpassing the teacher; found vulnerable contracts that Slither mislabeled. In addition to extending a source code analyzer to bytecode, DLVA is much faster than conventional tools for smart contract vulnerability detection based on formal methods: DLVA checks contracts for 29 vulnerabilities in 0.2 seconds, a speedup of 10-500x+ compared to traditional tools. DLVA has three key components. Smart Contract to Vector (SC2V) uses neural networks to map arbitrary smart contract bytecode to an high-dimensional floating-point vector. Sibling Detector (SD) classifies contracts when a target contract's vector is Euclidian-close to a labeled contract's vector in a training set; although only able to judge 55.7% of the contracts in our test set, it has an average accuracy of 97.4% with a false positive rate of only 0.1%. Lastly, Core Classifier (CC) uses neural networks to infer vulnerable contracts regardless of vector distance. DLVA has an overall accuracy of 96.6% with an associated false positive rate of only 3.7%.
FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system
Authors: Runwei Guan, Ka Lok Man, Feifan Chen, Shanliang Yao, Rongsheng Hu, Xiaohui Zhu, Jeremy Smith, Eng Gee Lim, Yutao Yue
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles. Because NL query can be easily obtained, such a task has a promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features and mapping them to the same latent space to compare the similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques to find keywords related to vehicle attributes. These techniques may require a lot of pre-processing and post-processing work, and also suffer from extracting the wrong keyword when the NL query is complex. To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain. It has 42.3k labelled NL descriptions of vehicle tracks, containing information such as the location, orientation, type and colour of the vehicle. FindVehicle also adopts both overlapping entities and fine-grained entities to meet further requirements. To verify its effectiveness, we propose a baseline NL-based vehicle retrieval model called VehicleFinder. Our experiment shows that by using text encoders pre-trained by FindVehicle, VehicleFinder achieves 87.7\% precision and 89.4\% recall when retrieving a target vehicle by text command on our homemade dataset based on UA-DETRAC. The time cost of VehicleFinder is 279.35 ms on one ARM v8.2 CPU and 93.72 ms on one RTX A4000 GPU, which is much faster than the Transformer-based system. The dataset is open-source via the link https://github.com/GuanRunwei/FindVehicle, and the implementation can be found via the link https://github.com/GuanRunwei/VehicleFinder-CTIM.
GCNH: A Simple Method For Representation Learning On Heterophilous Graphs
Abstract
Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs, i.e., graphs in which edges tend to connect nodes of the same type. Yet, achievement of consistent GNN performance on heterophilous graphs remains an open research problem. Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs, trading off model simplicity for prediction accuracy. However, these models fail to capture basic graph properties, such as neighborhood label distribution, which are fundamental for learning. In this work, we propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios. GCNH learns and combines separate representations for a node and its neighbors, using one learned importance coefficient per layer to balance the contributions of center nodes and neighborhoods. We conduct extensive experiments on eight real-world graphs and a set of synthetic graphs with varying degrees of heterophily to demonstrate how the design choices for GCNH lead to a sizable improvement over a vanilla GCN. Moreover, GCNH outperforms state-of-the-art models of much higher complexity on four out of eight benchmarks, while producing comparable results on the remaining datasets. Finally, we discuss and analyze the lower complexity of GCNH, which results in fewer trainable parameters and faster training times than other methods, and show how GCNH mitigates the oversmoothing problem.
Faster Prefix-Sorting Algorithms for Deterministic Finite Automata
Authors: Sung-Hwan Kim, Francisco Olivares, Nicola Prezza
Abstract
Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in $O(m^2 + n^{5/2})$ time, $m$ being the number of transitions and $n$ the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in $O(mn)$ and $O(n^2\log n)$ time, respectively, and an algorithm sorting acyclic DFAs in $O(m\log n)$ time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the $O(n^2\log n)$-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.
IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty
Abstract
In this work, we propose the Informed Batch Belief Trees (IBBT) algorithm for motion planning under motion and sensing uncertainties. The original stochastic motion planning problem is divided into a deterministic motion planning problem and a graph search problem. We solve the deterministic planning problem using sampling-based methods such as PRM or RRG to construct a graph of nominal trajectories. Then, an informed cost-to-go heuristic for the original problem is computed based on the nominal trajectory graph. Finally, we grow a belief tree by searching over the graph using the proposed heuristic. IBBT interleaves between batch state sampling, nominal trajectory graph construction, heuristic computing, and search over the graph to find belief space motion plans. IBBT is an anytime, incremental algorithm. With an increasing number of batches of samples added to the graph, the algorithm finds motion plans that converge to the optimal one. IBBT is efficient by reusing results between sequential iterations. The belief tree searching is an ordered search guided by an informed heuristic. We test IBBT in different planning environments. Our numerical investigation confirms that IBBT finds non-trivial motion plans and is faster compared with previous similar methods.
Learned Monotone Minimal Perfect Hashing
Authors: Paolo Ferragina, Hans-Peter Lehmann, Peter Sanders, Giorgio Vinciguerra
Abstract
A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 35% less space than the next best competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 10% of the best competitors while achieving up to 3 times faster queries.
Keyword: mobile
Joint Client Assignment and UAV Route Planning for Indirect-Communication Federated Learning
Authors: Jieming Bian, Cong Shen, Jie Xu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Abstract
Federated Learning (FL) is a machine learning approach that enables the creation of shared models for powerful applications while allowing data to remain on devices. This approach provides benefits such as improved data privacy, security, and reduced latency. However, in some systems, direct communication between clients and servers may not be possible, such as remote areas without proper communication infrastructure. To overcome this challenge, a new framework called FedEx (Federated Learning via Model Express Delivery) is proposed. This framework employs mobile transporters, such as UAVs, to establish indirect communication channels between the server and clients. These transporters act as intermediaries and allow for model information exchange. The use of indirect communication presents new challenges for convergence analysis and optimization, as the delay introduced by the transporters' movement creates issues for both global model dissemination and local model collection. To address this, two algorithms, FedEx-Sync and FedEx-Async, are proposed for synchronized and asynchronized learning at the transporter level. Additionally, a bi-level optimization algorithm is proposed to solve the joint client assignment and route planning problem. Experimental validation using two public datasets in a simulated network demonstrates consistent results with the theory, proving the efficacy of FedEx.
Safe Routing Approach by Identifying and Subsequently Eliminating the Attacks in MANET
Authors: S.M. Udhaya Sankar, D. Dhinakaran, C. Cathrin Deboral, M. Ramakrishnan
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
Abstract
Wireless networks that are decentralized and communicate without using existing infrastructure are known as mobile ad-hoc networks. The most common sorts of threats and attacks can affect MANETs. Therefore, it is advised to utilize intrusion detection, which controls the system to detect additional security issues. Monitoring is essential to avoid attacks and provide extra protection against unauthorized access. Although the current solutions have been designed to defeat the attack nodes, they still require additional hardware, have considerable delivery delays, do not offer high throughput or packet delivery ratios, or do not do so without using more energy. The capability of a mobile node to forward packets, which is dependent on the platform's life quality, may be impacted by the absence of the network node power source. We developed the Safe Routing Approach (SRA), which uses behaviour analysis to track and monitor attackers who discard packets during the route discovery process. The attacking node recognition system is made for irregular routing node detection to protect the controller network's usual properties from becoming recognized as an attack node. The suggested method examines the nearby attack nodes and conceals the trusted node in the routing pathway. The path is instantly assigned after the initial discovery of trust nodes based on each node's strength value. It extends the network's life span and reduces packet loss. In terms of Packet Delivery Ratio (PDR), energy consumption, network performance, and detection of attack nodes, the suggested approach is contrasted with AIS, ZIDS, and Improved AODV. The findings demonstrate that the recommended strategy performs superior in terms of PDR, residual energy, and network throughput.
HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation
Authors: Zhengcheng Shen, Yi Gao, Linh Kästner, Jens Lambrecht
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The advancement of computer vision and machine learning has made datasets a crucial element for further research and applications. However, the creation and development of robots with advanced recognition capabilities are hindered by the lack of appropriate datasets. Existing image or video processing datasets are unable to accurately depict observations from a moving robot, and they do not contain the kinematics information necessary for robotic tasks. Synthetic data, on the other hand, are cost-effective to create and offer greater flexibility for adapting to various applications. Hence, they are widely utilized in both research and industry. In this paper, we propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information. HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities. To demonstrate the usability of our dataset, two existing algorithms are used for evaluation and an approach to estimate the distance between the object and camera is implemented based on these segmentation methods and evaluated through the dataset. With the availability of this dataset, we aspire to foster further advancements in the field of mobile robotics, leading to more capable and intelligent robots that can navigate and interact with their environments more effectively. The code is publicly available at https://github.com/ignc-research/HabitatDyn.
Using Mobile Data and Deep Models to Assess Auditory Verbal Hallucinations
Authors: Shayan Mirjafari, Subigya Nepal, Weichen Wang, Andrew T. Campbell
Abstract
Hallucination is an apparent perception in the absence of real external sensory stimuli. An auditory hallucination is a perception of hearing sounds that are not real. A common form of auditory hallucination is hearing voices in the absence of any speakers which is known as Auditory Verbal Hallucination (AVH). AVH is fragments of the mind's creation that mostly occur in people diagnosed with mental illnesses such as bipolar disorder and schizophrenia. Assessing the valence of hallucinated voices (i.e., how negative or positive voices are) can help measure the severity of a mental illness. We study N=435 individuals, who experience hearing voices, to assess auditory verbal hallucination. Participants report the valence of voices they hear four times a day for a month through ecological momentary assessments with questions that have four answering scales from not at all'' toextremely''. We collect these self-reports as the valence supervision of AVH events via a mobile application. Using the application, participants also record audio diaries to describe the content of hallucinated voices verbally. In addition, we passively collect mobile sensing data as contextual signals. We then experiment with how predictive these linguistic and contextual cues from the audio diary and mobile sensing data are of an auditory verbal hallucination event. Finally, using transfer learning and data fusion techniques, we train a neural net model that predicts the valance of AVH with a performance of 54\% top-1 and 72\% top-2 F1 score.
Multivariate and Multi-step Traffic Prediction for NextG Networks with SLA Violation Constraints
Authors: Evren Tuna, Alkan Soysal
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper focuses on predicting downlink (DL) traffic volume in mobile networks while minimizing overprovisioning and meeting a given service-level agreement (SLA) violation rate. We present a multivariate, multi-step, and SLA-driven approach that incorporates 20 different radio access network (RAN) features, a custom feature set based on peak traffic hours, and handover-based clustering to leverage the spatiotemporal effects. In addition, we propose a custom loss function that ensures the SLA violation rate constraint is satisfied while minimizing overprovisioning. We also perform multi-step prediction up to 24 steps ahead and evaluate performance under both single-step and multi-step prediction conditions. Our study makes several contributions, including the analysis of RAN features, the custom feature set design, a custom loss function, and a parametric method to satisfy SLA constraints.
Keyword: pruning
ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks
Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John
Subjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
Abstract
The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $\mu$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $\mu$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Authors: Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Although vision transformers (ViTs) have shown promising results in various computer vision tasks recently, their high computational cost limits their practical applications. Previous approaches that prune redundant tokens have demonstrated a good trade-off between performance and computation costs. Nevertheless, errors caused by pruning strategies can lead to significant information loss. Our quantitative experiments reveal that the impact of pruned tokens on performance should be noticeable. To address this issue, we propose a novel joint Token Pruning & Squeezing module (TPS) for compressing vision transformers with higher efficiency. Firstly, TPS adopts pruning to get the reserved and pruned subsets. Secondly, TPS squeezes the information of pruned tokens into partial reserved tokens via the unidirectional nearest-neighbor matching and similarity-based fusing steps. Compared to state-of-the-art methods, our approach outperforms them under all token pruning intensities. Especially while shrinking DeiT-tiny&small computational budgets to 35%, it improves the accuracy by 1%-6% compared with baselines on ImageNet classification. The proposed method can accelerate the throughput of DeiT-small beyond DeiT-tiny, while its accuracy surpasses DeiT-tiny by 4.78%. Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy. Code is available at https://github.com/megvii-research/TPS-CVPR2023.
Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources
Abstract
The increasing penetration of distributed energy resources (DERs) will decrease the rotational inertia of the power system and further degrade the system frequency stability. To address the above issues, this paper leverages the advanced neural network (NN) to learn the frequency dynamics and incorporates NN to facilitate system reliable operation. This paper proposes the conservative sparse neural network (CSNN) embedded frequency-constrained unit commitment (FCUC) with converter-based DERs, including the learning and optimization stages. In the learning stage, it samples the inertia parameters, calculates the corresponding frequency, and characterizes the stability region of the sampled parameters using the convex hulls to ensure stability and avoid extrapolation. For conservativeness, the positive prediction error penalty is added to the loss function to prevent possible frequency requirement violation. For the sparsity, the NN topology pruning is employed to eliminate unnecessary connections for solving acceleration. In the optimization stage, the trained CSNN is transformed into mixed-integer linear constraints using the big-M method and then incorporated to establish the data-enhanced model. The case study verifies 1) the effectiveness of the proposed model in terms of high accuracy, fewer parameters, and significant solving acceleration; 2) the stable system operation against frequency violation under contingency.
Keyword: voxel
VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
Authors: Huiyu Gao, Wei Mao, Miaomiao Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We propose VisFusion, a visibility-aware online 3D scene reconstruction approach from posed monocular videos. In particular, we aim to reconstruct the scene from volumetric features. Unlike previous reconstruction methods which aggregate features for each voxel from input views without considering its visibility, we aim to improve the feature fusion by explicitly inferring its visibility from a similarity matrix, computed from its projected features in each image pair. Following previous works, our model is a coarse-to-fine pipeline including a volume sparsification process. Different from their works which sparsify voxels globally with a fixed occupancy threshold, we perform the sparsification on a local feature volume along each visual ray to preserve at least one voxel per ray for more fine details. The sparse local volume is then fused with a global one for online reconstruction. We further propose to predict TSDF in a coarse-to-fine manner by learning its residuals across scales leading to better TSDF predictions. Experimental results on benchmarks show that our method can achieve superior performance with more scene details. Code is available at: https://github.com/huiyu-gao/VisFusion
Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer
Authors: Yuzhen Ding, Hongying Feng, Yunze Yang, Jason Holmes, Zhengliang Liu, David Liu, William W. Wong, Nathan Y. Yu, Terence T. Sio, Steven E. Schild, Baoxin Li, Wei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Abstract
Purpose: In some proton therapy facilities, patient alignment relies on two 2D orthogonal kV images, taken at fixed, oblique angles, as no 3D on-the-bed imaging is available. The visibility of the tumor in kV images is limited since the patient's 3D anatomy is projected onto a 2D plane, especially when the tumor is behind high-density structures such as bones. This can lead to large patient setup errors. A solution is to reconstruct the 3D CT image from the kV images obtained at the treatment isocenter in the treatment position. Methods: An asymmetric autoencoder-like network built with vision-transformer blocks was developed. The data was collected from 1 head and neck patient: 2 orthogonal kV images (1024x1024 voxels), 1 3D CT with padding (512x512x512) acquired from the in-room CT-on-rails before kVs were taken and 2 digitally-reconstructed-radiograph (DRR) images (512x512) based on the CT. We resampled kV images every 8 voxels and DRR and CT every 4 voxels, thus formed a dataset consisting of 262,144 samples, in which the images have a dimension of 128 for each direction. In training, both kV and DRR images were utilized, and the encoder was encouraged to learn the jointed feature map from both kV and DRR images. In testing, only independent kV images were used. The full-size synthetic CT (sCT) was achieved by concatenating the sCTs generated by the model according to their spatial information. The image quality of the synthetic CT (sCT) was evaluated using mean absolute error (MAE) and per-voxel-absolute-CT-number-difference volume histogram (CDVH). Results: The model achieved a speed of 2.1s and a MAE of <40HU. The CDVH showed that <5% of the voxels had a per-voxel-absolute-CT-number-difference larger than 185 HU. Conclusion: A patient-specific vision-transformer-based network was developed and shown to be accurate and efficient to reconstruct 3D CT images from kV images.
Keyword: lidar
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer
Authors: Hao Xiang, Runsheng Xu, Jiaqi Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vehicle-to-Vehicle technologies have enabled autonomous vehicles to share information to see through occlusions, greatly enhancing perception performance. Nevertheless, existing works all focused on homogeneous traffic where vehicles are equipped with the same type of sensors, which significantly hampers the scale of collaboration and benefit of cross-modality interactions. In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. We present HM-ViT, the first unified multi-agent hetero-modal cooperative perception framework that can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents. To effectively fuse features from multi-view images and LiDAR point clouds, we design a novel heterogeneous 3D graph transformer to jointly reason inter-agent and intra-agent interactions. The extensive experiments on the V2V perception dataset OPV2V demonstrate that the HM-ViT outperforms SOTA cooperative perception methods for V2V hetero-modal cooperative perception. We will release codes to facilitate future research.
Keyword: diffusion
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Authors: Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine
Abstract
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importance sampled our intended policy. We introduce Implicit Diffusion Q-learning (IDQL), combining our general IQL critic with the policy extraction method. IDQL maintains the ease of implementation of IQL while outperforming prior offline RL methods and demonstrating robustness to hyperparameters. Code is available at https://github.com/philippe-eecs/IDQL.
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Authors: Jason J. Yu, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task due to unobserved elements within the scene (i.e., occlusion) and outside the field-of-view makes the use of generative models appealing to capture the variety of possible outputs. In this paper, we propose a novel generative model which is capable of producing a sequence of photorealistic images consistent with a specified camera trajectory, and a single starting image. Our approach is centred on an autoregressive conditional diffusion-based model capable of interpolating visible scene elements, and extrapolating unobserved regions in a view, in a geometrically consistent manner. Conditioning is limited to an image capturing a single camera view and the (relative) pose of the new camera view. To measure the consistency over a sequence of generated views, we introduce a new metric, the thresholded symmetric epipolar distance (TSED), to measure the number of consistent frame pairs in a sequence. While previous methods have been shown to produce high quality images and consistent semantics across pairs of views, we show empirically with our metric that they are often inconsistent with the desired camera poses. In contrast, we demonstrate that our method produces both photorealistic and view-consistent imagery.
Matching-based Data Valuation for Generative Model
Abstract
Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.
A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs
Authors: Matteo Caldana, Paola F. Antonietti, Luca Dede'
Abstract
Algebraic multigrid (AMG) methods are among the most efficient solvers for linear systems of equations and they are widely used for the solution of problems stemming from the discretization of Partial Differential Equations (PDEs). The most severe limitation of AMG methods is the dependence on parameters that require to be fine-tuned. In particular, the strong threshold parameter is the most relevant since it stands at the basis of the construction of successively coarser grids needed by the AMG methods. We introduce a novel Deep Learning algorithm that minimizes the computational cost of the AMG method when used as a finite element solver. We show that our algorithm requires minimal changes to any existing code. The proposed Artificial Neural Network (ANN) tunes the value of the strong threshold parameter by interpreting the sparse matrix of the linear system as a black-and-white image and exploiting a pooling operator to transform it into a small multi-channel image. We experimentally prove that the pooling successfully reduces the computational cost of processing a large sparse matrix and preserves the features needed for the regression task at hand. We train the proposed algorithm on a large dataset containing problems with a highly heterogeneous diffusion coefficient defined in different three-dimensional geometries and discretized with unstructured grids and linear elasticity problems with a highly heterogeneous Young's modulus. When tested on problems with coefficients or geometries not present in the training dataset, our approach reduces the computational time by up to 30%.
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
Abstract
We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.
A convection-diffusion problem with a large shift on Duran meshes
Authors: Mirjana Brdar, Sebastian Franz, Hans-Goerg Roosc
Abstract
A convection-diffusion problem with a large shift in space is considered. Numerical analysis of high order finite element methods on layer-adapted Duran type meshes, as well as on coarser Duran type meshes in places where weak layers appear, is provided. The theoretical results are confirmed by numerical experiments.
Improved Diffusion-based Image Colorization via Piggybacked Models
Abstract
Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.
BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis
Authors: Angela Castillo, Maria Escobar, Guillaume Jeanneret, Albert Pumarola, Pablo Arbeláez, Ali Thabet, Artsiom Sanakoyeu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem. We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences. To the best of our knowledge, this is the first approach that uses the reverse diffusion process to model full-body tracking as a conditional sequence generation task. We conduct experiments on the large-scale motion-capture dataset AMASS and show that our approach outperforms the state-of-the-art approaches by a significant margin in terms of full-body motion realism and joint reconstruction error.
Keyword: dynamic
Optimization of a Hydrodynamic Computational Reservoir through Evolution
Abstract
As demand for computational resources reaches unprecedented levels, research is expanding into the use of complex material substrates for computing. In this study, we interface with a model of a hydrodynamic system, under development by a startup, as a computational reservoir and optimize its properties using an evolution in materio approach. Input data are encoded as waves applied to our shallow water reservoir, and the readout wave height is obtained at a fixed detection point. We optimized the readout times and how inputs are mapped to the wave amplitude or frequency using an evolutionary search algorithm, with the objective of maximizing the system's ability to linearly separate observations in the training data by maximizing the readout matrix determinant. Applying evolutionary methods to this reservoir system substantially improved separability on an XNOR task, in comparison to implementations with hand-selected parameters. We also applied our approach to a regression task and show that our approach improves out-of-sample accuracy. Results from this study will inform how we interface with the physical reservoir in future work, and we will use these methods to continue to optimize other aspects of the physical implementation of this system as a computational reservoir.
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer
Authors: Hao Xiang, Runsheng Xu, Jiaqi Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Vehicle-to-Vehicle technologies have enabled autonomous vehicles to share information to see through occlusions, greatly enhancing perception performance. Nevertheless, existing works all focused on homogeneous traffic where vehicles are equipped with the same type of sensors, which significantly hampers the scale of collaboration and benefit of cross-modality interactions. In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. We present HM-ViT, the first unified multi-agent hetero-modal cooperative perception framework that can collaboratively predict 3D objects for highly dynamic vehicle-to-vehicle (V2V) collaborations with varying numbers and types of agents. To effectively fuse features from multi-view images and LiDAR point clouds, we design a novel heterogeneous 3D graph transformer to jointly reason inter-agent and intra-agent interactions. The extensive experiments on the V2V perception dataset OPV2V demonstrate that the HM-ViT outperforms SOTA cooperative perception methods for V2V hetero-modal cooperative perception. We will release codes to facilitate future research.
Feature point detection in HDR images based on coefficient of variation
Authors: Artur Santos Nascimento, Welerson Augusto Lino de Jesus Melo, Daniel Oliveira Dantas, Beatriz Trinchão Andrade
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Feature point (FP) detection is a fundamental step of many computer vision tasks. However, FP detectors are usually designed for low dynamic range (LDR) images. In scenes with extreme light conditions, LDR images present saturated pixels, which degrade FP detection. On the other hand, high dynamic range (HDR) images usually present no saturated pixels but FP detection algorithms do not take advantage of all the information present in such images. FP detection frequently relies on differential methods, which work well in LDR images. However, in HDR images, the differential operation response in bright areas overshadows the response in dark areas. As an alternative to standard FP detection methods, this study proposes an FP detector based on a coefficient of variation (CV) designed for HDR images. The CV operation adapts its response based on the standard deviation of pixels inside a window, working well in both dark and bright areas of HDR images. The proposed and standard detectors are evaluated by measuring their repeatability rate (RR) and uniformity. Our proposed detector shows better performance when compared to other standard state-of-the-art detectors. In uniformity metric, our proposed detector surpasses all the other algorithms. In other hand, when using the repeatability rate metric, the proposed detector is worse than Harris for HDR and SURF detectors.
FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth Prediction for Autonomous Driving
Authors: Yuxuan Liu, Zhenhua Xu, Huaiyang Huang, Lujia Wang, Ming Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Predicting accurate depth with monocular images is important for low-cost robotic applications and autonomous driving. This study proposes a comprehensive self-supervised framework for accurate scale-aware depth prediction on autonomous driving scenes utilizing inter-frame poses obtained from inertial measurements. In particular, we introduce a Full-Scale depth prediction network named FSNet. FSNet contains four important improvements over existing self-supervised models: (1) a multichannel output representation for stable training of depth prediction in driving scenarios, (2) an optical-flow-based mask designed for dynamic object removal, (3) a self-distillation training strategy to augment the training process, and (4) an optimization-based post-processing algorithm in test time, fusing the results from visual odometry. With this framework, robots and vehicles with only one well-calibrated camera can collect sequences of training image frames and camera poses, and infer accurate 3D depths of the environment without extra labeling work or 3D data. Extensive experiments on the KITTI dataset, KITTI-360 dataset and the nuScenes dataset demonstrate the potential of FSNet. More visualizations are presented in \url{https://sites.google.com/view/fsnet/home}
Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources
Abstract
The increasing penetration of distributed energy resources (DERs) will decrease the rotational inertia of the power system and further degrade the system frequency stability. To address the above issues, this paper leverages the advanced neural network (NN) to learn the frequency dynamics and incorporates NN to facilitate system reliable operation. This paper proposes the conservative sparse neural network (CSNN) embedded frequency-constrained unit commitment (FCUC) with converter-based DERs, including the learning and optimization stages. In the learning stage, it samples the inertia parameters, calculates the corresponding frequency, and characterizes the stability region of the sampled parameters using the convex hulls to ensure stability and avoid extrapolation. For conservativeness, the positive prediction error penalty is added to the loss function to prevent possible frequency requirement violation. For the sparsity, the NN topology pruning is employed to eliminate unnecessary connections for solving acceleration. In the optimization stage, the trained CSNN is transformed into mixed-integer linear constraints using the big-M method and then incorporated to establish the data-enhanced model. The case study verifies 1) the effectiveness of the proposed model in terms of high accuracy, fewer parameters, and significant solving acceleration; 2) the stable system operation against frequency violation under contingency.
DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases
Abstract
Endoscopic Ultrasound-Fine Needle Aspiration (EUS-FNA) is used to examine pancreatic cancer. EUS-FNA is an examination using EUS to insert a thin needle into the tumor and collect pancreatic tissue fragments. Then collected pancreatic tissue fragments are then stained to classify whether they are pancreatic cancer. However, staining and visual inspection are time consuming. In addition, if the pancreatic tissue fragment cannot be examined after staining, the collection must be done again on the other day. Therefore, our purpose is to classify from an unstained image whether it is available for examination or not, and to exceed the accuracy of visual classification by specialist physicians. Image classification before staining can reduce the time required for staining and the burden of patients. However, the images of pancreatic tissue fragments used in this study cannot be successfully classified by processing the entire image because the pancreatic tissue fragments are only a part of the image. Therefore, we propose a DeformableFormer that uses Deformable Convolution in MetaFormer framework. The architecture consists of a generalized model of the Vision Transformer, and we use Deformable Convolution in the TokenMixer part. In contrast to existing approaches, our proposed DeformableFormer is possible to perform feature extraction more locally and dynamically by Deformable Convolution. Therefore, it is possible to perform suitable feature extraction for classifying target. To evaluate our method, we classify two categories of pancreatic tissue fragments; available and unavailable for examination. We demonstrated that our method outperformed the accuracy by specialist physicians and conventional methods.
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
Authors: Dimitri Breda, Simone De Reggi, Rossana Vermiglio
Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS)
Abstract
We numerically investigate the stability of linear age-structured population models with nonlocal diffusion, which arise naturally in describing dynamics of infectious diseases. Compared to Laplace diffusion, the analysis of models with nonlocal diffusion is more challenging since the associated semigroups have no regularizing properties in the spatial variable. Nevertheless, the asymptotic stability of the null equilibrium is determined by the spectrum of the infinitesimal generator associated to the semigroup. We propose to approximate the leading part of this spectrum by first reformulating the problem via integration of the age-state and then by discretizing the generator combining a spectral projection in space with a pseudospectral collocation in age. A rigorous convergence analysis is provided in the case of separable model coefficients. Results are confirmed experimentally and numerical tests are presented also for the more general instance.
A Comprehensive Review on Ontologies for Scenario-based Testing in the Context of Autonomous Driving
Authors: Maximilian Zipfl, Nina Koch, J. Marius Zöllner
Abstract
The verification and validation of autonomous driving vehicles remains a major challenge due to the high complexity of autonomous driving functions. Scenario-based testing is a promising method for validating such a complex system. Ontologies can be utilized to produce test scenarios that are both meaningful and relevant. One crucial aspect of this process is selecting the appropriate method for describing the entities involved. The level of detail and specific entity classes required will vary depending on the system being tested. It is important to choose an ontology that properly reflects these needs. This paper summarizes key representative ontologies for scenario-based testing and related use cases in the field of autonomous driving. The considered ontologies are classified according to their level of detail for both static facts and dynamic aspects. Furthermore, the ontologies are evaluated based on the presence of important entity classes and the relations between them.
Effective Numerical Simulations of Synchronous Generator System
Abstract
Synchronous generator system is a complicated dynamical system for energy transmission, which plays an important role in modern industrial production. In this article, we propose some predictor-corrector methods and structure-preserving methods for a generator system based on the first benchmark model of subsynchronous resonance, among which the structure-preserving methods preserve a Dirac structure associated with the so-called port-Hamiltonian descriptor systems. To illustrate this, the simplified generator system in the form of index-1 differential-algebraic equations has been derived. Our analyses provide the global error estimates for a special class of structure-preserving methods called Gauss methods, which guarantee their superior performance over the PSCAD/EMTDC and the predictor-corrector methods in terms of computational stability. Numerical simulations are implemented to verify the effectiveness and advantages of our methods.
AMP in the wild: Learning robust, agile, natural legged locomotion skills
Abstract
The successful transfer of a learned controller from simulation to the real world for a legged robot requires not only the ability to identify the system, but also accurate estimation of the robot's state. In this paper, we propose a novel algorithm that can infer not only information about the parameters of the dynamic system, but also estimate important information about the robot's state from previous observations. We integrate our algorithm with Adversarial Motion Priors and achieve a robust, agile, and natural gait in both simulation and on a Unitree A1 quadruped robot in the real world. Empirical results demonstrate that our proposed algorithm enables traversing challenging terrains with lower power consumption compared to the baselines. Both qualitative and quantitative results are presented in this paper.
Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems
Authors: Mehran Salmani (1), Saeid Ghafouri (2 and 4), Alireza Sanaee (2), Kamran Razavi (3), Max Mühlhäuser (3), Joseph Doyle (2), Pooyan Jamshidi (4), Mohsen Sharif (1) ((1) Iran University of Science and Technology, (2) Queen Mary University of London, (3) Technical University of Darmstadt, (4) University of South Carolina)
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Abstract
The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65% and 33%, respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).
Electromechanical memcapacitive neurons for energy-efficient spiking neural networks
Authors: Zixi Zhang, Yuriy V. Pershin, Ivar Martin
Subjects: Emerging Technologies (cs.ET); Mesoscale and Nanoscale Physics (cond-mat.mes-hall)
Abstract
In this article, we introduce a new nanoscale electromechanical device -- a leaky memcapacitor -- and show that it may be useful for the hardware implementation of spiking neurons. The leaky memcapacitor is a movable-plate capacitor that becomes quite conductive when the plates come close to each other. The equivalent circuit of the leaky memcapacitor involves a memcapacitive and memristive system connected in parallel. In the leaky memcapacitor, the resistance and capacitance depend on the same internal state variable, which is the displacement of the movable plate. We have performed a comprehensive analysis showing that several spiking types observed in biological neurons can be implemented with the leaky memcapacitor. Significant attention is paid to the dynamic properties of the model. As in leaky memcapacitors the capacitive and leaking resistive functionalities are implemented naturally within the same device structure, their use will simplify the creation of spiking neural networks.
Gradient-Based Distributed Controller Design Over Directed Networks
Abstract
In this study, we propose a design methodology of distributed controllers for multi-agent systems on a class of directed interaction networks by extending the gradient-flow method. Although the gradient-flow method is a common design tool for distributed controllers, it is inapplicable to directed networks. First, we demonstrate how to construct a distributed controller for systems over a class of time-invariant directed graphs. Subsequently, we achieve better convergence properties and performance enhancement than the conventional gradient-flow method. To illustrate its application in time-varying networks, we address the dynamic matching problem of two distinct groups of agents with different sensing ranges. This problem is a novel coordination task that involves pairing agents from two distinct groups to achieve a convergence of the paired agents' states to the same value. Accordingly, we apply the proposed method to this problem and provide sufficient conditions for successful matching. Lastly, numerical examples for systems on both time-invariant and time-varying networks demonstrate the effectiveness of the proposed method.
Real-Time Implementation of Dynamic State Estimation for Microgrid Load Bus Protection
Authors: Sarbajit Basu, Arthur K. Barnes, Adam Mate, Olga Lavrova
Abstract
Inverter-interfaced microgrids, owing to the lack of fault current, cannot be protected using traditional over-current protections, while admittance or differential relaying protection schemes are not practical to be implemented. Dynamic state estimation can track and predict power system transients and has been extensively investigated for setting-less protection. A novel real-time application of dynamic state estimation for protection is proposed in this paper, wherein parameter estimation and parallel processing is used to identify the state of the system. The implementation scheme has low process complexity and employs a data acquisition device and estimator that run on a general-purpose computer. This proposed implementation extends the state-of-the-art, under short-circuit conditions, to a real-time implementation with a lumped-load radial microgrid and a grid-forming inverter with current-limiting behavior.
Gradient Derivation for Learnable Parameters in Graph Attention Networks
Authors: Marion Neumeier, Andreas Tollkühn, Sebastian Dorn, Michael Botsch, Wolfgang Utschick
Abstract
This work provides a comprehensive derivation of the parameter gradients for GATv2 [4], a widely used implementation of Graph Attention Networks (GATs). GATs have proven to be powerful frameworks for processing graph-structured data and, hence, have been used in a range of applications. However, the achieved performance by these attempts has been found to be inconsistent across different datasets and the reasons for this remains an open research question. As the gradient flow provides valuable insights into the training dynamics of statistically learning models, this work obtains the gradients for the trainable model parameters of GATv2. The gradient derivations supplement the efforts of [2], where potential pitfalls of GATv2 are investigated.
RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments
Abstract
Current simultaneous localization and mapping (SLAM) algorithms perform well in static environments but easily fail in dynamic environments. Recent works introduce deep learning-based semantic information to SLAM systems to reduce the influence of dynamic objects. However, it is still challenging to apply a robust localization in dynamic environments for resource-restricted robots. This paper proposes a real-time RGB-D inertial odometry system for resource-restricted robots in dynamic environments named Dynamic-VINS. Three main threads run in parallel: object detection, feature tracking, and state optimization. The proposed Dynamic-VINS combines object detection and depth information for dynamic feature recognition and achieves performance comparable to semantic segmentation. Dynamic-VINS adopts grid-based feature detection and proposes a fast and efficient method to extract high-quality FAST feature points. IMU is applied to predict motion for feature tracking and moving consistency check. The proposed method is evaluated on both public datasets and real-world applications and shows competitive localization accuracy and robustness in dynamic environments. Yet, to the best of our knowledge, it is the best-performance real-time RGB-D inertial odometry for resource-restricted platforms in dynamic environments for now. The proposed system is open source at: https://github.com/HITSZ-NRSL/Dynamic-VINS.git
Multi-level decision framework collision avoidance algorithm in emergency scenarios
Authors: Guoying Chen, Xinyu Wang, Min Hua, Wei Liu
Abstract
With the rapid development of autonomous driving, the attention of academia has increasingly focused on the development of anti-collision systems in emergency scenarios, which have a crucial impact on driving safety. While numerous anti-collision strategies have emerged in recent years, most of them only consider steering or braking. The dynamic and complex nature of the driving environment presents a challenge to developing robust collision avoidance algorithms in emergency scenarios. To address the complex, dynamic obstacle scene and improve lateral maneuverability, this paper establishes a multi-level decision-making obstacle avoidance framework that employs the safe distance model and integrates emergency steering and emergency braking to complete the obstacle avoidance process. This approach helps avoid the high-risk situation of vehicle instability that can result from the separation of steering and braking actions. In the emergency steering algorithm, we define the collision hazard moment and propose a multi-constraint dynamic collision avoidance planning method that considers the driving area. Simulation results demonstrate that the decision-making collision avoidance logic can be applied to dynamic collision avoidance scenarios in complex traffic situations, effectively completing the obstacle avoidance task in emergency scenarios and improving the safety of autonomous driving.
Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT
Authors: Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du
Abstract
Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.
A Multi-Fidelity Bayesian Approach to Safe Controller Design
Abstract
Safely controlling unknown dynamical systems is one of the biggest challenges in the field of control. Oftentimes, an approximate model of a system's dynamics exists which provides beneficial information for the selection of controls. However, differences between the approximate and true systems present challenges as well as safety concerns. We propose an algorithm called SAFE-SLOPE to safely evaluate points from a Gaussian process model of a function when its Lipschitz constant is unknown. We establish theoretical guarantees for the performance of SAFE-SLOPE and quantify how multi-fidelity modeling improves the algorithm's performance. Finally, we demonstrate how SAFE-SLOPE achieves lower cumulative regret than a naive sampling method by applying it to find the control gains of a linear time-invariant system.
Analog Feedback-Controlled Memristor programming Circuit for analog Content Addressable Memory
Authors: Jiaao Yu, Paul-Philipp Manea, Sara Ameli, Mohammad Hizzani, Amro Eldebiky, John Paul Strachan
Abstract
Recent breakthroughs in associative memories suggest that silicon memories are coming closer to human memories, especially for memristive Content Addressable Memories (CAMs) which are capable to read and write in analog values. However, the Program-Verify algorithm, the state-of-the-art memristor programming algorithm, requires frequent switching between verifying and programming memristor conductance, which brings many defects such as high dynamic power and long programming time. Here, we propose an analog feedback-controlled memristor programming circuit that makes use of a novel look-up table-based (LUT-based) programming algorithm. With the proposed algorithm, the programming and the verification of a memristor can be performed in a single-direction sequential process. Besides, we also integrated a single proposed programming circuit with eight analog CAM (aCAM) cells to build an aCAM array. We present SPICE simulations on TSMC 28nm process. The theoretical analysis shows that 1. A memristor conductance within an aCAM cell can be converted to an output boundary voltage in aCAM searching operations and 2. An output boundary voltage in aCAM searching operations can be converted to a programming data line voltage in aCAM programming operations. The simulation results of the proposed programming circuit prove the theoretical analysis and thus verify the feasibility to program memristors without frequently switching between verifying and programming the conductance. Besides, the simulation results of the proposed aCAM array show that the proposed programming circuit can be integrated into a large array architecture.
Backpropagation-free Training of Deep Physical Neural Networks
Authors: Ali Momeni, Babak Rahmani, Matthieu Mallejac, Philipp Del Hougne, Romain Fleury
Abstract
Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.
An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph
Authors: Nafis Tanveer Islam, Gonzalo De La Torre Parra, Dylan Manuel, Elias Bou-Harb, Peyman Najafirad
Abstract
Over the years, open-source software systems have become prey to threat actors. Even as open-source communities act quickly to patch the breach, code vulnerability screening should be an integral part of agile software development from the beginning. Unfortunately, current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification. Furthermore, the datasets used for vulnerability learning often exhibit distribution shifts from the real-world testing distribution due to novel attack strategies deployed by adversaries and as a result, the machine learning model's performance may be hindered or biased. To address these issues, we propose a joint interpolated multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN). We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF). Poacher flow edges reduce the gap between dynamic and static program analysis and handle complex long-range dependencies. Moreover, our approach reduces biases of classifiers regarding unbalanced datasets by integrating Focal Loss objective function along with SVG. Remarkably, experimental results show that our classifier outperforms state-of-the-art results on vulnerability detection with fewer false negatives and false positives. After testing our model across multiple datasets, it shows an improvement of at least 2.41% and 18.75% in the best-case scenario. Evaluations using N-day program samples demonstrate that our proposed approach achieves a 93% accuracy and was able to detect 4, zero-day vulnerabilities from popular GitHub repositories.
Generative AI-enabled Vehicular Networks: Fundamentals, Framework, and Case Study
Authors: Ruichen Zhang, Ke Xiong, Hongyang Du, Dusit Niyato, Jiawen Kang, Xuemin Shen, H. Vincent Poor
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Abstract
Recognizing the tremendous improvements that the integration of generative AI can bring to intelligent transportation systems, this article explores the integration of generative AI technologies in vehicular networks, focusing on their potential applications and challenges. Generative AI, with its capabilities of generating realistic data and facilitating advanced decision-making processes, enhances various applications when combined with vehicular networks, such as navigation optimization, traffic prediction, data generation, and evaluation. Despite these promising applications, the integration of generative AI with vehicular networks faces several challenges, such as real-time data processing and decision-making, adapting to dynamic and unpredictable environments, as well as privacy and security concerns. To address these challenges, we propose a multi-modality semantic-aware framework to enhance the service quality of generative AI. By leveraging multi-modal and semantic communication technologies, the framework enables the use of text and image data for creating multi-modal content, providing more reliable guidance to receiving vehicles and ultimately improving system usability and efficiency. To further improve the reliability and efficiency of information transmission and reconstruction within the framework, taking generative AI-enabled vehicle-to-vehicle (V2V) as a case study, a deep reinforcement learning (DRL)-based approach is proposed for resource allocation. Finally, we discuss potential research directions and anticipated advancements in the field of generative AI-enabled vehicular networks.
Approximate Shielding of Atari Agents for Safe Exploration
Authors: Alexander W. Goodall, Francesco Belardinelli
Abstract
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.
Keyword: efficient
Using Z3 for Formal Modeling and Verification of FNN Global Robustness
KOIOS: Top-k Semantic Overlap Set Search
B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Enhancing Artificial intelligence Policies with Fusion and Forecasting: Insights from Indian Patents Using Network Analysis
ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks
NFT Marketplace
Get Rid Of Your Trail: Remotely Erasing Backdoors in Federated Learning
On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers
Word Sense Induction with Knowledge Distillation from BERT
Modular Hardware Design with Timeline Types
Feature point detection in HDR images based on coefficient of variation
SLEPLET: Slepian Scale-Discretised Wavelets in Python
Matching-based Data Valuation for Generative Model
EulerNet: Adaptive Feature Interaction Learning via Euler's Formula for CTR Prediction
Deep Learning-empowered Predictive Precoder Design for OTFS Transmission in URLLC
Energy management system for biological 3D printing by the refinement of manifold model morphing in flexible grasping space
Linear building pattern recognition via spatial knowledge graph
Multi-scale Evolutionary Neural Architecture Search for Deep Spiking Neural Networks
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation
Learn to Cluster Faces with Better Subgraphs
A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs
An Analytical Model for Performance Estimation in High-Capacity IMDD Systems
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
Better Sign Language Translation with Monolingual Data
How Well Does the Metropolis Algorithm Cope With Local Optima?
Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation
Med-Tuning: Exploring Parameter-Efficient Transfer Learning for Medical Volumetric Segmentation
GCNH: A Simple Method For Representation Learning On Heterophilous Graphs
Factored Neural Representation for Scene Understanding
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
Online Time-Optimal Trajectory Planning on Three-Dimensional Race Tracks
IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty
RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments
Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation
Knowledge Distillation Under Ideal Joint Classifier Assumption
Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
Backpropagation-free Training of Deep Physical Neural Networks
SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model
HeRo: RoBERTa and Longformer Hebrew Language Models
A Convolutional Spiking Network for Gesture Recognition in Brain-Computer Interfaces
Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer
Keyword: faster
Smart Learning to Find Dumb Contracts
FindVehicle and VehicleFinder: A NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system
GCNH: A Simple Method For Representation Learning On Heterophilous Graphs
Faster Prefix-Sorting Algorithms for Deterministic Finite Automata
IBBT: Informed Batch Belief Trees for Motion Planning Under Uncertainty
Learned Monotone Minimal Perfect Hashing
Keyword: mobile
Joint Client Assignment and UAV Route Planning for Indirect-Communication Federated Learning
Safe Routing Approach by Identifying and Subsequently Eliminating the Attacks in MANET
HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation
Using Mobile Data and Deep Models to Assess Auditory Verbal Hallucinations
not at all'' to
extremely''. We collect these self-reports as the valence supervision of AVH events via a mobile application. Using the application, participants also record audio diaries to describe the content of hallucinated voices verbally. In addition, we passively collect mobile sensing data as contextual signals. We then experiment with how predictive these linguistic and contextual cues from the audio diary and mobile sensing data are of an auditory verbal hallucination event. Finally, using transfer learning and data fusion techniques, we train a neural net model that predicts the valance of AVH with a performance of 54\% top-1 and 72\% top-2 F1 score.Multivariate and Multi-step Traffic Prediction for NextG Networks with SLA Violation Constraints
Keyword: pruning
ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources
Keyword: voxel
VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
Deep-Learning-based Fast and Accurate 3D CT Deformable Image Registration in Lung Cancer
Keyword: lidar
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer
Keyword: diffusion
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies
Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models
Matching-based Data Valuation for Generative Model
A Deep Learning algorithm to accelerate Algebraic Multigrid methods in Finite Element solvers of 3D elliptic PDEs
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
A convection-diffusion problem with a large shift on Duran meshes
Improved Diffusion-based Image Colorization via Piggybacked Models
BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis
Keyword: dynamic
Optimization of a Hydrodynamic Computational Reservoir through Evolution
HM-ViT: Hetero-modal Vehicle-to-Vehicle Cooperative perception with vision transformer
Feature point detection in HDR images based on coefficient of variation
FSNet: Redesign Self-Supervised MonoDepth for Full-Scale Depth Prediction for Autonomous Driving
Conservative Sparse Neural Network Embedded Frequency-Constrained Unit Commitment With Distributed Energy Resources
DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases
A numerical method for the stability analysis of linear age-structured models with nonlocal diffusion
A Comprehensive Review on Ontologies for Scenario-based Testing in the Context of Autonomous Driving
Effective Numerical Simulations of Synchronous Generator System
AMP in the wild: Learning robust, agile, natural legged locomotion skills
Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems
Electromechanical memcapacitive neurons for energy-efficient spiking neural networks
Gradient-Based Distributed Controller Design Over Directed Networks
Real-Time Implementation of Dynamic State Estimation for Microgrid Load Bus Protection
Gradient Derivation for Learnable Parameters in Graph Attention Networks
RGB-D Inertial Odometry for a Resource-Restricted Robot in Dynamic Environments
Multi-level decision framework collision avoidance algorithm in emergency scenarios
Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT
A Multi-Fidelity Bayesian Approach to Safe Controller Design
Analog Feedback-Controlled Memristor programming Circuit for analog Content Addressable Memory
Backpropagation-free Training of Deep Physical Neural Networks
An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph
Generative AI-enabled Vehicular Networks: Fundamentals, Framework, and Case Study
Approximate Shielding of Atari Agents for Safe Exploration