Abstract
The concept of integrating physics-based and data-driven approaches has become popular for modeling sustainable energy systems. However, the existing literature mainly focuses on the data-driven surrogates generated to replace physics-based models. These models often trade accuracy for speed but lack the generalisability, adaptability, and interpretability inherent in physics-based models, which are often indispensable in the modeling of real-world dynamic systems for optimization and control purposes. In this work, we propose a novel architecture for generating model-integrated neural networks (MINN) to allow integration on the level of learning physics-based dynamics of the system. The obtained hybrid model solves an unsettled research problem in control-oriented modeling, i.e., how to obtain an optimally simplified model that is physically insightful, numerically accurate, and computationally tractable simultaneously. We apply the proposed neural network architecture to model the electrochemical dynamics of lithium-ion batteries and show that MINN is extremely data-efficient to train while being sufficiently generalizable to previously unseen input data, owing to its underlying physical invariants. The MINN battery model has an accuracy comparable to the first principle-based model in predicting both the system outputs and any locally distributed electrochemical behaviors but achieves two orders of magnitude reduction in the solution time.
Model Explainability in Physiological and Healthcare-based Neural Networks
Authors: Rohit Sharma, Abhinav Gupta, Arnav Gupta, Bo Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
The estimation and monitoring of SpO2 are crucial for assessing lung function and treating chronic pulmonary diseases. The COVID-19 pandemic has highlighted the importance of early detection of changes in SpO2, particularly in asymptomatic patients with clinical deterioration. However, conventional SpO2 measurement methods rely on contact-based sensing, presenting the risk of cross-contamination and complications in patients with impaired limb perfusion. Additionally, pulse oximeters may not be available in marginalized communities and undeveloped countries. To address these limitations and provide a more comfortable and unobtrusive way to monitor SpO2, recent studies have investigated SpO2 measurement using videos. However, measuring SpO2 using cameras in a contactless way, particularly from smartphones, is challenging due to weaker physiological signals and lower optical selectivity of smartphone camera sensors. The system includes three main steps: 1) extraction of the region of interest (ROI), which includes the palm and back of the hand, from the smartphone-captured videos; 2) spatial averaging of the ROI to produce R, G, and B time series; and 3) feeding the time series into an optophysiology-inspired CNN for SpO2 estimation. Our proposed method can provide a more efficient and accurate way to monitor SpO2 using videos captured from consumer-grade smartphones, which can be especially useful in telehealth and health screening settings.
MWaste: A Deep Learning Approach to Manage Household Waste
Abstract
Computer vision methods have shown to be effective in classifying garbage into recycling categories for waste processing, existing methods are costly, imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile application that uses computer vision and deep learning techniques to classify waste materials as trash, plastic, paper, metal, glass or cardboard. Its effectiveness was tested on various neural network architectures and real-world images, achieving an average precision of 92\% on the test set. This app can help combat climate change by enabling efficient waste processing and reducing the generation of greenhouse gases caused by incorrect waste disposal.
SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation
Abstract
Effective oil spill segmentation in Synthetic Aperture Radar (SAR) images is critical for marine oil pollution cleanup, and proper image representation is helpful for accurate image segmentation. In this paper, we propose an effective oil spill image segmentation network named SRCNet by leveraging SAR image representation and the training for oil spill segmentation simultaneously. Specifically, our proposed segmentation network is constructed with a pair of deep neural nets with the collaboration of the seminal representation that describes SAR images, where one deep neural net is the generative net which strives to produce oil spill segmentation maps, and the other is the discriminative net which trys its best to distinguish between the produced and the true segmentations, and they thus built a two-player game. Particularly, the seminal representation exploited in our proposed SRCNet originates from SAR imagery, modelling with the internal characteristics of SAR images. Thus, in the training process, the collaborated seminal representation empowers the mapped generative net to produce accurate oil spill segmentation maps efficiently with small amount of training data, promoting the discriminative net reaching its optimal solution at a fast speed. Therefore, our proposed SRCNet operates effective oil spill segmentation in an economical and efficient manner. Additionally, to increase the segmentation capability of the proposed segmentation network in terms of accurately delineating oil spill details in SAR images, a regularisation term that penalises the segmentation loss is devised. This encourages our proposed SRCNet for accurately segmenting oil spill areas from SAR images. Empirical experimental evaluations from different metrics validate the effectiveness of our proposed SRCNet for oil spill image segmentation.
Read My Mind: A Multi-Modal Dataset for Human Belief Prediction
Authors: Jiafei Duan, Samson Yu, Nicholas Tan, Yi Ru Wang, Cheston Tan
Abstract
Understanding human intentions is key to enabling effective and efficient human-robot interaction (HRI) in collaborative settings. To enable developments and evaluation of the ability of artificial intelligence (AI) systems to infer human beliefs, we introduce a large-scale multi-modal video dataset for intent prediction based on object-context relations.
Suspicious Vehicle Detection Using Licence Plate Detection And Facial Feature Recognition
Abstract
With the increasing need to strengthen vehicle safety and detection, the availability of pre-existing methods of catching criminals and identifying vehicles manually through the various traffic surveillance cameras is not only time-consuming but also inefficient. With the advancement of technology in every field the use of real-time traffic surveillance models will help facilitate an easy approach. Keeping this in mind, the main focus of our paper is to develop a combined face recognition and number plate recognition model to ensure vehicle safety and real-time tracking of running-away criminals and stolen vehicles.
An Efficient Ensemble Explainable AI (XAI) Approach for Morphed Face Detection
Abstract
The extensive utilization of biometric authentication systems have emanated attackers / imposters to forge user identity based on morphed images. In this attack, a synthetic image is produced and merged with genuine. Next, the resultant image is user for authentication. Numerous deep neural convolutional architectures have been proposed in literature for face Morphing Attack Detection (MADs) to prevent such attacks and lessen the risks associated with them. Although, deep learning models achieved optimal results in terms of performance, it is difficult to understand and analyse these networks since they are black box/opaque in nature. As a consequence, incorrect judgments may be made. There is, however, a dearth of literature that explains decision-making methods of black box deep learning models for biometric Presentation Attack Detection (PADs) or MADs that can aid the biometric community to have trust in deep learning-based biometric systems for identification and authentication in various security applications such as border control, criminal database establishment etc. In this work, we present a novel visual explanation approach named Ensemble XAI integrating Saliency maps, Class Activation Maps (CAM) and Gradient-CAM (Grad-CAM) to provide a more comprehensive visual explanation for a deep learning prognostic model (EfficientNet-B1) that we have employed to predict whether the input presented to a biometric authentication system is morphed or genuine. The experimentations have been performed on three publicly available datasets namely Face Research Lab London Set, Wide Multi-Channel Presentation Attack (WMCA), and Makeup Induced Face Spoofing (MIFS). The experimental evaluations affirms that the resultant visual explanations highlight more fine-grained details of image features/areas focused by EfficientNet-B1 to reach decisions along with appropriate reasoning.
Visual Referential Games Further the Emergence of Disentangled Representations
Authors: Kevin Denamganaï, Sondess Missaoui, James Alfred Walker
Abstract
Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations that generalise well in deep learning, and is thought to be a necessary condition to enable systematicity. Thus, this paper investigates how do compositionality at the level of the emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games. Firstly, we find that visual referential games that are based on the Obverter architecture outperforms state-of-the-art unsupervised learning approach in terms of many major disentanglement metrics. Secondly, we expand the previously proposed Positional Disentanglement (PosDis) metric for compositionality to (re-)incorporate some concerns pertaining to informativeness and completeness features found in the Mutual Information Gap (MIG) disentanglement metric it stems from. This extension allows for further discrimination between the different kind of compositional languages that emerge in the context of Obverter-based referential games, in a way that neither the referential game accuracy nor previous metrics were able to capture. Finally we investigate whether the resulting (emergent) systematicity, as measured by zero-shot compositional learning tests, correlates with any of the disentanglement and compositionality metrics proposed so far. Throughout the training process, statically significant correlation coefficients can be found both positive and negative depending on the moment of the measure.
Multivariate Representation Learning for Information Retrieval
Authors: Hamed Zamani, Michael Bendersky
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot product function. In this paper, we propose a new representation learning framework for dense retrieval. Instead of learning a vector for each query and document, our framework learns a multivariate distribution and uses negative multivariate KL divergence to compute the similarity between distributions. For simplicity and efficiency reasons, we assume that the distributions are multivariate normals and then train large language models to produce mean and variance vectors for these distributions. We provide a theoretical foundation for the proposed framework and show that it can be seamlessly integrated into the existing approximate nearest neighbor algorithms to perform retrieval efficiently. We conduct an extensive suite of experiments on a wide range of datasets, and demonstrate significant improvements compared to competitive dense retrieval models.
It is all about where you start: Text-to-image generation with seed selection
Authors: Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, Gal Chechik
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. SeedSelect is efficient and does not require retraining the diffusion model. We evaluate the benefit of SeedSelect on a series of problems. First, in few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks. We show classification improvement on all classes, both from the head and tail of the training data of diffusion models. We further evaluate SeedSelect on correcting images of hands, a well-known pitfall of current diffusion models, and show that it improves hand generation substantially.
Identifying Minimal Changes in the Zone Abstract Domain
Abstract
Verification techniques express program states as logical formulas over program variables. For example, symbolic execution and abstract interpretation encode program states as a set of integer inequalities. However, for real-world programs these formulas tend to become large, which affects scalability of analyses. To address this problem, researchers developed complementary approaches which either remove redundant inequalities or extract a subset of inequalities sufficient for specific reasoning. For arbitrary integer inequalities, such reduction approaches either have high complexities or over-approximate. However, efficiency and precision of these approaches can be improved for a restricted type of logical formulas used in relational numerical abstract domains. While previous work investigated custom efficient redundant inequality elimination for Zones states, our work examines custom semantic slicing algorithms that identify a minimal set of changed inequalities in Zones states. The client application of the minimal changes in Zones is an empirical study on comparison between invariants computed by data-flow analysis using Zones, Intervals and Predicates numerical domains. In particular, evaluations compare how our proposed algorithms affect the precision of comparing Zones vs. Intervals and Zones vs. Predicates abstract domains. The results show our techniques reduce the number of variables by more than 70% and the number of inequalities by 30%, compared to full states. The approach refines the granularity of comparison between domains, reducing incomparable invariants between Zones and Predicates from 52% to 4%, and increases equality of Intervals and Zones, invariants from 27% to 71%. The techniques improve the comparison efficiency by reducing total runtime for all subject comparisons for Zones and Predicates from over 4 minutes to a few seconds.
Neural Implicit Dense Semantic SLAM
Authors: Yasaman Haghighi, Suryansh Kumar, Jean Philippe Thiran, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents an efficient online framework to solve the well-known semantic Visual Simultaneous Localization and Mapping (V-SLAM) problem for indoor scenes leveraging the advantages of neural implicit scene representation. Existing methods on similar lines, such as NICE-SLAM, has some critical practical limitations to put to use for such an important indoor scene understanding problem. To this end, we contend for the following proposition for modern semantic V-SLAM contrary to existing methods assuming RGB-D frames as input (i) For a rigid scene, robust and accurate camera motion could be computed with disentangled tracking and 3D mapping pipeline. (ii) Using neural fields, a dense and multifaceted scene representation of SDF, semantics, RGB, and depth is provided memory efficiently. (iii) Rather than using every frame, we demonstrate that the set of keyframes is sufficient to learn excellent scene representation, thereby improving the pipeline's train time. (iv) Multiple local mapping networks could be used to extend the pipeline for large-scale scenes. We show via extensive experiments on several popular benchmark datasets that our approach offers accurate tracking, mapping, and semantic labeling at test time even with noisy and highly sparse depth measurements. Later in the paper, we show that our pipeline can easily extend to RGB image input. Overall, the proposed pipeline offers a favorable solution to an important scene understanding task that can assist in diverse robot visual perception and related problems.
An Adaptive Channel Reservation MAC Protocol Based on Forwarding Traffic of Key Nodes
Authors: Ze Liu, Bo Li, Mao Yang, ZhongJiang Yan
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Ad Hoc networks with multi-hop topology are widely used in military and civilian applications. One challenge for Ad Hoc networks is to design efficient Media Access Control (MAC) protocols to ensure the quality of service (QoS). In Ad Hoc networks, there is a kind of node called key node, which undertakes more forwarding traffic than other surrounding nodes. The number of neighbor nodes around key nodes is often large, and the surrounding channel environment and interference are often more complex. Thus, the key nodes can hardly get enough channel access opportunities, resulting in poor end-to-end performance. Therefore, we propose an adaptive channel reservation MAC protocol based on forwarding traffic of key nodes, which is aimed at alleviating the congestion for key nodes. Nodes initiate reservations for future transmission time according to the buffer status before sending packets and then calculate the Weight of Reservation Ability (WRA). The node adaptively adjusts its reservation opportunity by comparing the WRA with neighbor nodes, thus improving the channel access efficiency and ensuring the transmission opportunity of key nodes. Extensive simulation confirms that our proposed FTKN-CRM provides significant improvements in end-to-end performance over the IEEE 802.11ax protocol and other reservation access protocols.
Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening
Abstract
This paper introduces a learning-based framework for robot adaptive manipulating the object with a revolute joint in unstructured environments. We concentrate our discussion on various cabinet door opening tasks. To improve the performance of Deep Reinforcement Learning in this scene, we analytically provide an efficient sampling manner utilizing the constraints of the objects. To open various kinds of doors, we add encoded environment parameters that define the various environments to the input of out policy. To transfer the policy into the real world, we train an adaptation module in simulation and fine-tune the adaptation module to cut down the impact of the policy-unaware environment parameters. We design a series of experiments to validate the efficacy of our framework. Additionally, we testify to the model's performance in the real world compared to the traditional door opening method.
Timely Mobile Routing: An Experimental Study
Authors: Vishakha Ramani, Jiachen Chen, Roy D. Yates
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Time-critical cyber-physical applications demand the timely delivery of information. In this work, we employ a high-speed packet processing testbed to quantitatively analyze a packet forwarding application running on a shared memory multi-processor architecture, where efficient synchronization of concurrent access to a Forwarding Information Base is essential for low-latency and timely delivery of information. While modern packet processing frameworks are optimized for maximum packet throughput, their ability to support timely delivery remains an open question. Here we focus on the age of information performance issues induced by throughput-focused packet processing frameworks. Our results underscore the importance of careful selection of offered load parameters and concurrency constructs in such frameworks.
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening
Authors: Mingsong Li, Yikun Liu, Tao Xiao, Yuwen Huang, Gongping Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image. Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency. For one thing, the universally adopted black box principle limits the model interpretability. For another thing, existing DL-based methods fail to efficiently capture local and global dependencies at the same time, inevitably limiting the overall performance. To address these mentioned issues, we first formulate the degradation process of the high-resolution multispectral (HrMS) image as a unified variational optimization problem, and alternately solve its data and prior subproblems by the designed iterative proximal gradient descent (PGD) algorithm. Moreover, we customize a Local-Global Transformer (LGT) to simultaneously model local and global dependencies, and further formulate an LGT-based prior module for image denoising. Besides the prior module, we also design a lightweight data module. Finally, by serially integrating the data and prior modules in each iterative stage, we unfold the iterative algorithm into a stage-wise unfolding network, Local-Global Transformer Enhanced Unfolding Network (LGTEUN), for the interpretable MS pan-sharpening. Comprehensive experimental results on three satellite data sets demonstrate the effectiveness and efficiency of LGTEUN compared with state-of-the-art (SOTA) methods. The source code is available at https://github.com/lms-07/LGTEUN.
DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration
Abstract
Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the control-flow paradigm to orchestrate a serverless workflow. However, the control-flow paradigm inherently results in long response latency, due to the heavy data persistence overhead, sequential resource usage, and late function triggering. Our investigation shows that the data-flow paradigm has the potential to resolve the above problems, with careful design and optimization. We propose DataFlower, a scheme that achieves the data-flow paradigm for serverless workflows. In DataFlower, a container is abstracted to be a function logic unit and a data logic unit. The function logic unit runs the functions, and the data logic unit handles the data transmission asynchronously. Moreover, a host-container collaborative communication mechanism is used to support efficient data transfer. Our experimental results show that compared to state-of-the-art serverless designs, DataFlower reduces the 99\%-ile latency of the benchmarks by up to 35.4\%, and improves the peak throughput by up to 3.8X.
An Adaptive Policy to Employ Sharpness-Aware Minimization
Authors: Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok
Abstract
Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization. However, since each SAM update requires computing two gradients, its computational cost and training time are both doubled compared to standard empirical risk minimization (ERM). Recent state-of-the-arts reduce the fraction of SAM updates and thus accelerate SAM by switching between SAM and ERM updates randomly or periodically. In this paper, we design an adaptive policy to employ SAM based on the loss landscape geometry. Two efficient algorithms, AE-SAM and AE-LookSAM, are proposed. We theoretically show that AE-SAM has the same convergence rate as SAM. Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy.
Effective Data Aggregation in WSN for Enhanced Security and Data Privacy
Authors: B. Murugeshwari, S. Aminta Sabatini, Lovelit Jose, S. Padmapriya
Abstract
The two biggest problems with wireless sensor networks are security and energy usage. In sensing devices, malicious nodes could be found in large numbers. The researchers have proposed several methods to find these rogue nodes. To prevent assaults on these networks and data transmission, the data must be secured. Data aggregation aids in reducing the number of messages transmitted within the network, which in turn lowers total network energy consumption. Additionally, when decrypting the aggregated data, the base station can distinguish between encrypted and consolidated analysis based on top of the cryptographic keys. By examining the effectiveness of the data aggregation in this research. To solve the above problem, the system provides a method in which an efficient cluster agent is preferred pedestal on its location at the access point and energy availability. The sensor network's energy consumption is reduced by selecting an effective cluster agent, extending the network's lifespan. The cluster's agent is in indict of compiling data for each member node. The clustering agent validates the data and tosses any errors before aggregation. The clustering agent only aggregates confirmed data. To provide end-to-end anonymity, ElGamal elliptic curve (ECE) encryption is used to secure the client data and reassign the encrypted information en route for the cluster agent. Only the base station (BS) can decrypt the data. Furthermore, an ID-based signature system is utilized to enable authenticity. This research presents a technique for recuperating lost data. The access point employs a cache-based backup system to search for lost data.
Client Recruitment for Federated Learning in ICU Length of Stay Prediction
Authors: Vincent Scheltjens, Lyse Naomi Wamba Momo, Wouter Verbeke, Bart De Moor
Abstract
Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment
Authors: Haoning Wu, Liang Liao, Annan Wang, Chaofeng Chen, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies. Contemporary supervised opinion-driven VQA strategies predominantly hinge on training from expensive human annotations for quality scores, which limited the scale and distribution of VQA datasets and consequently led to unsatisfactory generalization capacity of methods driven by these data. On the other hand, although several handcrafted zero-shot quality indices do not require training from human opinions, they are unable to account for the semantics of videos, rendering them ineffective in comprehending complex authentic distortions (e.g., white balance, exposure) and assessing the quality of semantic content within videos. To address these challenges, we introduce the text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) to ascertain the affinity between textual prompts and visual features, facilitating a comprehensive examination of semantic quality concerns without the reliance on human quality annotations. By amalgamating SAQI with existing low-level metrics, we propose the unified Blind Video Quality Index (BVQI) and its improved version, BVQI-Local, which demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24\% on all datasets. Moreover, we devise an efficient fine-tuning scheme for BVQI-Local that jointly optimizes text prompts and final fusion weights, resulting in state-of-the-art performance and superior generalization ability in comparison to prevalent opinion-driven VQA methods. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
Quantum Cross Subspace Alignment Codes via the $N$-sum Box Abstraction
Abstract
Cross-subspace alignment (CSA) codes are used in various private information retrieval (PIR) schemes (e.g., with secure storage) and in secure distributed batch matrix multiplication (SDBMM). Using a recently developed $N$-sum box abstraction of a quantum multiple-access channel (QMAC), we translate CSA schemes over classical multiple-access channels into efficient quantum CSA schemes over a QMAC, achieving maximal superdense coding gain. Because of the $N$-sum box abstraction, the underlying problem of coding to exploit quantum entanglements for CSA schemes, becomes conceptually equivalent to that of designing a channel matrix for a MIMO MAC subject to given structural constraints imposed by the $N$-sum box abstraction, such that the resulting MIMO MAC is able to implement the functionality of a CSA scheme (encoding/decoding) over-the-air. Applications include Quantum PIR with secure and MDS-coded storage, as well as Quantum SDBMM.
Graph Neural Networks on Factor Graphs for Robust, Fast, and Scalable Linear State Estimation with PMUs
Abstract
As phasor measurement units (PMUs) become more widely used in transmission power systems, a fast state estimation (SE) algorithm that can take advantage of their high sample rates is needed. To accomplish this, we present a method that uses graph neural networks (GNNs) to learn complex bus voltage estimates from PMU voltage and current measurements. We propose an original implementation of GNNs over the power system's factor graph to simplify the integration of various types and quantities of measurements on power system buses and branches. Furthermore, we augment the factor graph to improve the robustness of GNN predictions. This model is highly efficient and scalable, as its computational complexity is linear with respect to the number of nodes in the power system. Training and test examples were generated by randomly sampling sets of power system measurements and annotated with the exact solutions of linear SE with PMUs. The numerical results demonstrate that the GNN model provides an accurate approximation of the SE solutions. Furthermore, errors caused by PMU malfunctions or communication failures that would normally make the SE problem unobservable have a local effect and do not deteriorate the results in the rest of the power system.
Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot
Authors: Santiago Márquez Solís
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This research article presents various design patterns for improving interoperability in Polkadot, a blockchain platform. These patterns include chain bridges, interoperability standards, common asset identifiers, governance agreements, oracle chains, and a hypothetical design pattern called Zero Trust Chain. Implementation of these design patterns can help improve security and confidence in transactions between different chains on the Polkadot network, allowing for faster and more efficient communication. The article also emphasizes the importance of interoperability in blockchain technology and highlights Polkadot's flexibility in creating customized specialized chains that can further improve interoperability on the network. Overall, this article highlights how design patterns can improve interoperability in Polkadot, which could lead to greater adoption of blockchain technology in various industries.
FlowTransformer: A Transformer Framework for Flow-based Network Intrusion Detection Systems
Authors: Liam Daly Manocchio, Siamak Layeghy, Wai Weng Lo, Gayan K. Kulatilleke, Mohanad Sarhan, Marius Portmann
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Abstract
This paper presents the FlowTransformer framework, a novel approach for implementing transformer-based Network Intrusion Detection Systems (NIDSs). FlowTransformer leverages the strengths of transformer models in identifying the long-term behaviour and characteristics of networks, which are often overlooked by most existing NIDSs. By capturing these complex patterns in network traffic, FlowTransformer offers a flexible and efficient tool for researchers and practitioners in the cybersecurity community who are seeking to implement NIDSs using transformer-based models. FlowTransformer allows the direct substitution of various transformer components, including the input encoding, transformer, classification head, and the evaluation of these across any flow-based network dataset. To demonstrate the effectiveness and efficiency of the FlowTransformer framework, we utilise it to provide an extensive evaluation of various common transformer architectures, such as GPT 2.0 and BERT, on three commonly used public NIDS benchmark datasets. We provide results for accuracy, model size and speed. A key finding of our evaluation is that the choice of classification head has the most significant impact on the model performance. Surprisingly, Global Average Pooling, which is commonly used in text classification, performs very poorly in the context of NIDS. In addition, we show that model size can be reduced by over 50\%, and inference and training times improved, with no loss of accuracy, by making specific choices of input encoding and classification head instead of other commonly used alternatives.
Orthogonal polynomial bases in the Mixed Virtual Element Method
Abstract
The use of orthonormal polynomial bases has been found to be efficient in preventing ill-conditioning of the system matrix in the primal formulation of Virtual Element Methods (VEM) for high values of polynomial degree and in presence of badly-shaped polygons. However, we show that using the natural extension of a orthogonal polynomial basis built for the primal formulation is not sufficient to cure ill-conditioning in the mixed case. Thus, in the present work, we introduce an orthogonal vector-polynomial basis which is built ad hoc for being used in the mixed formulation of VEM and which leads to very high-quality solution in each tested case. Furthermore, a numerical experiment related to simulations in Discrete Fracture Networks (DFN), which are often characterised by very badly-shaped elements, is proposed to validate our procedures.
Hyperparameter Optimization through Neural Network Partitioning
Authors: Bruno Mlodozeniec, Matthias Reisser, Christos Louizos
Abstract
Well-tuned hyperparameters are crucial for obtaining good generalization behavior in neural networks. They can enforce appropriate inductive biases, regularize the model and improve performance -- especially in the presence of limited data. In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions, respectively. Each partition is associated with and optimized only on specific data shards. Combining these partitions into subnetworks allows us to define the ``out-of-training-sample" loss of a subnetwork, i.e., the loss on data shards unseen by the subnetwork, as the objective for hyperparameter optimization. We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging.
MCPrioQ: A lock-free algorithm for online sparse markov-chains
Abstract
In high performance systems it is sometimes hard to build very large graphs that are efficient both with respect to memory and compute. This paper proposes a data structure called Markov-chain-priority-queue (MCPrioQ), which is a lock-free sparse markov-chain that enables online and continuous learning with time-complexity of $O(1)$ for updates and $O(CDF^{-1}(t))$ inference. MCPrioQ is especially suitable for recommender-systems for lookups of $n$-items in descending probability order. The concurrent updates are achieved using hash-tables and atomic instructions and the lookups are achieved through a novel priority-queue which allows for approximately correct results even during concurrent updates. The approximatly correct and lock-free property is maintained by a read-copy-update scheme, but where the semantics have been slightly updated to allow for swap of elements rather than the traditional pop-insert scheme.
Channel Orthogonalization with Reconfigurable Surfaces
Authors: Juan Vidal Alegria, Fredrik Rusek
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Orthogonal multi-user multiple-input multiple-output (MU-MIMO) channels allow for optimum performance with simplified precoding/equalization, and they achieve maximum multiplexing gain which is shared fairly among users. Reconfigurable intelligent surface (RIS) constitutes a promising cost-efficient solution to improve the wireless channel, since they consist of passive reflecting elements able to adjust the phases of the incoming waves. However, it is still widely unclear how these surfaces can improve spatial-multiplexing. In fact, the common RIS model cannot achieve perfect orthogonalization of MU-MIMO channels with a reasonable number of elements. Furthermore, efficient channel estimation algorithms for RIS, which are key for taking advantage of its benefits, are still a matter of research. We study two types of reconfigurable surfaces (RSs), namely amplitude-reconfigurable intelligent surface (ARIS) and fully-reconfigurable intelligent surface (FRIS), with extended capabilities over RIS. We show how these RSs allow for perfect channel orthogonalization, and, by minimizing the applied power, we show that they can potentially be implemented without the need of amplification. We also present an efficient channel estimation method for each of them that allows the base station (BS) to select the desired propagation channel.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Authors: Junge Zhang, Feihu Zhang, Shaochen Kuang, Li Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data.
Earning Extra Performance from Restrictive Feedbacks
Abstract
Many machine learning applications encounter a situation where model providers are required to further refine the previously trained model so as to gratify the specific need of local users. This problem is reduced to the standard model tuning paradigm if the target data is permissibly fed to the model. However, it is rather difficult in a wide range of practical cases where target data is not shared with model providers but commonly some evaluations about the model are accessible. In this paper, we formally set up a challenge named \emph{Earning eXtra PerformancE from restriCTive feEDdbacks} (EXPECTED) to describe this form of model tuning problems. Concretely, EXPECTED admits a model provider to access the operational performance of the candidate model multiple times via feedback from a local user (or a group of users). The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks. Unlike existing model tuning methods where the target data is always ready for calculating model gradients, the model providers in EXPECTED only see some feedbacks which could be as simple as scalars, such as inference accuracy or usage rate. To enable tuning in this restrictive circumstance, we propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution. In particular, for the deep models whose parameters distribute across multiple layers, a more query-efficient algorithm is further tailor-designed that conducts layerwise tuning with more attention to those layers which pay off better. Our theoretical analyses justify the proposed algorithms from the aspects of both efficacy and efficiency. Extensive experiments on different applications demonstrate that our work forges a sound solution to the EXPECTED problem.
Regret Optimal Control for Uncertain Stochastic Systems
Authors: Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate
Abstract
We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the benchmark policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To overcome this issue, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be efficiently solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.
Abstract
Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are safety, efficiency, lowest-cost path generation, and obstacle avoidance. Before path planning can take place we need map representation which can be discretized or open configuration space. Discretized configuration space provides node/connectivity information from one point to another. While in open/free configuration space it is up to the algorithm to create a list of nodes and then find a feasible path. Both types of maps are populated by obstacle positions using perception obstacle detection techniques to represent current obstacles from the perspective of the robot. For open configuration spaces, sampling-based planning algorithms are used. This paper aims to explore various types of Sampling-based path-planning algorithms such as Probabilistic RoadMap (PRM), and Rapidly-exploring Random Trees (RRT). These two algorithms also have optimized versions - PRM and RRT and this paper discusses how that optimization is achieved and is beneficial.
Abstract
This paper presents a computational framework for the concise encoding of an ensemble of persistence diagrams, in the form of weighted Wasserstein barycenters [99], [101] of a dictionary of atom diagrams. We introduce a multi-scale gradient descent approach for the efficient resolution of the corresponding minimization problem, which interleaves the optimization of the barycenter weights with the optimization of the atom diagrams. Our approach leverages the analytic expressions for the gradient of both sub-problems to ensure fast iterations and it additionally exploits shared-memory parallelism. Extensive experiments on public ensembles demonstrate the efficiency of our approach, with Wasserstein dictionary computations in the orders of minutes for the largest examples. We show the utility of our contributions in two applications. First, we apply Wassserstein dictionaries to data reduction and reliably compress persistence diagrams by concisely representing them with their weights in the dictionary. Second, we present a dimensionality reduction framework based on a Wasserstein dictionary defined with a small number of atoms (typically three) and encode the dictionary as a low dimensional simplex embedded in a visual space (typically in 2D). In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used to reproduce our results.
Dense Hybrid Proposal Modulation for Lane Detection
Authors: Yuejian Wu, Linqing Zhao, Jiwen Lu, Haibin Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we present a dense hybrid proposal modulation (DHPM) method for lane detection. Most existing methods perform sparse supervision on a subset of high-scoring proposals, while other proposals fail to obtain effective shape and location guidance, resulting in poor overall quality. To address this, we densely modulate all proposals to generate topologically and spatially high-quality lane predictions with discriminative representations. Specifically, we first ensure that lane proposals are physically meaningful by applying single-lane shape and location constraints. Benefitting from the proposed proposal-to-label matching algorithm, we assign each proposal a target ground truth lane to efficiently learn from spatial layout priors. To enhance the generalization and model the inter-proposal relations, we diversify the shape difference of proposals matching the same ground-truth lane. In addition to the shape and location constraints, we design a quality-aware classification loss to adaptively supervise each positive proposal so that the discriminative power can be further boosted. Our DHPM achieves very competitive performances on four popular benchmark datasets. Moreover, we consistently outperform the baseline model on most metrics without introducing new parameters and reducing inference speed.
A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform
Authors: Tobias Long, Robert Barnett, Richard Jefferson-Loveday, Giovanni Stabile, Matteo Icardi
Abstract
Problems with dominant advection, discontinuities, travelling features, or shape variations are widespread in computational mechanics. However, classical linear model reduction and interpolation methods typically fail to reproduce even relatively small parameter variations, making the reduced models inefficient and inaccurate. In this work a novel reduced order modelling approach is proposed based on the Radon-Cumulative-Distribution transform (RCDT). We show that this non-linear transformation can significantly improve the dimensionality of proper orthogonal decomposition (POD) reconstructions and is capable of interpolating accurately some advection-dominated phenomena. The method is tested on various testcases in multiphase fluid dynamics.
Ensuring Reliable Robot Task Performance through Probabilistic Rare-Event Verification and Synthesis
Authors: Guy Scher, Sadra Sadraddini, Ariel Yadin, Hadas Kress-Gazit
Abstract
Providing guarantees on the safe operation of robots against edge cases is challenging as testing methods such as traditional Monte-Carlo require too many samples to provide reasonable statistics. Built upon recent advancements in rare-event sampling, we present a model-based method to verify if a robotic system satisfies a Signal Temporal Logic (STL) specification in the face of environment variations and sensor/actuator noises. Our method is efficient and applicable to both linear and nonlinear and even black-box systems with arbitrary, but known, uncertainty distributions. For linear systems with Gaussian uncertainties, we exploit a feature to find optimal parameters that minimize the probability of failure. We demonstrate illustrative examples on applying our approach to real-world autonomous robotic systems.
The Power of Typed Affine Decision Structures: A Case Study
Authors: Gerrit Nolte, Maximilian Schlüter, Alnis Murtovi, Bernhard Steffen
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract
TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically change a neural networks perception, and show that TADS can be used to provide precise diagnostics on how and where robustness errors a occur. We achieve these results by introducing Precondition Projection, a technique that yields a TADS describing network behavior precisely on a given subset of its input space, and combining it with PCA, a traditional, well-understood dimensionality reduction technique. We show that PCA is easily compatible with TADS. All analyses can be implemented in a straightforward fashion using the rich algebraic properties of TADS, demonstrating the utility of the TADS framework for neural network explainability and verification. While TADS do not yet scale as efficiently as state-of-the-art neural network verifiers, we show that, using PCA-based simplifications, they can still scale to mediumsized problems and yield concise explanations for potential errors that can be used for other purposes such as debugging a network or generating new training samples.
Model Predictive Control of Wind Turbines with Piecewise-Affine Power Coefficient Approximation
Authors: Arnold Sterle, Aaron Grapentin, Christian A. Hans, Jörg Raisch
Abstract
In this paper, an offset-free bilinear model predictive control approach for wind turbines is presented. State-of-the-art controllers employ different control loops for pitch angle and generator torque which switch depending on wind conditions. In contrast, the presented controller is based on one unified control law that works for all wind conditions. The inherent nonlinearity of wind turbines is addressed through a piecewise-affine approximation of the power coefficient, which is modelled in a mixed-integer fashion. The presented controller is compared to a state-of-the-art baseline controller in a numerical case study using OpenFAST. Simulation results show that the presented controller ensures accurate reference power tracking. Additionally, damage equivalent loads are reduced for higher wind speeds.
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
Authors: Johannes Czech, Jannis Blüml, Kristian Kersting
Abstract
While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero.
An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours
Abstract
Congestion in traffic is an unavoidable circumstance in many cities in India and other countries. It is an issue of major concern. The steep rise in the number of automobiles on the roads followed by old infrastructure, accidents, pedestrian traffic, and traffic rule violations all add to challenging traffic conditions. Given these poor conditions of traffic, there is a critical need for automatically detecting and signaling systems. There are already various technologies that are used for traffic management and signaling systems like video analysis, infrared sensors, and wireless sensors. The main issue with these methods is they are very costly and high maintenance is required. In this paper, we have proposed a three-phase system that can guide emergency vehicles and manage traffic based on the degree of congestion. In the first phase, the system processes the captured images and calculates the Index value which is used to discover the degree of congestion. The Index value of a particular road depends on its width and the length up to which the camera captures images of that road. We have to take input for the parameters (length and width) while setting up the system. In the second phase, the system checks whether there are any emergency vehicles present or not in any lane. In the third phase, the whole processing and decision-making part is performed at the edge server. The proposed model is robust and it takes into consideration adverse weather conditions such as hazy, foggy, and windy. It works very efficiently in low light conditions also. The edge server is a strategically placed server that provides us with low latency and better connectivity. Using Edge technology in this traffic management system reduces the strain on cloud servers and the system becomes more reliable in real-time because the latency and bandwidth get reduced due to processing at the intermediate edge server.
An Empirical Study of Multimodal Model Merging
Authors: Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Model merging (e.g., via interpolation or task arithmetic) fuses multiple models trained on different tasks to generate a multi-task solution. The technique has been proven successful in previous studies, where the models are trained on similar tasks and with the same initialization. In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities. Furthermore, we conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture to create a parameter-efficient modality-agnostic architecture. Through comprehensive experiments, we systematically investigate the key factors impacting model performance after merging, including initialization, merging mechanisms, and model architectures. Our analysis leads to an effective training recipe for matching the performance of the modality-agnostic baseline (i.e. pre-trained from scratch) via model merging. Our code is available at: https://github.com/ylsung/vl-merging
Information Redundancy and Biases in Public Document Information Extraction Benchmarks
Authors: Seif Laatiri, Pirashanth Ratnamogan, Joel Tang, Laurent Lam, William Vanhuffel, Fabien Caspani
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Advances in the Visually-rich Document Understanding (VrDU) field and particularly the Key-Information Extraction (KIE) task are marked with the emergence of efficient Transformer-based approaches such as the LayoutLM models. Despite the good performance of KIE models when fine-tuned on public benchmarks, they still struggle to generalize on complex real-life use-cases lacking sufficient document annotations. Our research highlighted that KIE standard benchmarks such as SROIE and FUNSD contain significant similarity between training and testing documents and can be adjusted to better evaluate the generalization of models. In this work, we designed experiments to quantify the information redundancy in public benchmarks, revealing a 75% template replication in SROIE official test set and 16% in FUNSD. We also proposed resampling strategies to provide benchmarks more representative of the generalization ability of models. We showed that models not suited for document analysis struggle on the adjusted splits dropping on average 10,5% F1 score on SROIE and 3.5% on FUNSD compared to multi-modal models dropping only 7,5% F1 on SROIE and 0.5% F1 on FUNSD.
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data
Authors: Michał Turski, Tomasz Stanisławek, Karol Kaczmarek, Paweł Dyda, Filip Graliński
Abstract
In recent years, the field of document understanding has progressed a lot. A significant part of this progress has been possible thanks to the use of language models pretrained on large amounts of documents. However, pretraining corpora used in the domain of document understanding are single domain, monolingual, or nonpublic. Our goal in this paper is to propose an efficient pipeline for creating a big-scale, diverse, multilingual corpus of PDF files from all over the Internet using Common Crawl, as PDF files are the most canonical types of documents as considered in document understanding. We analysed extensively all of the steps of the pipeline and proposed a solution which is a trade-off between data quality and processing time. We also share a CCpdf corpus in a form or an index of PDF files along with a script for downloading them, which produces a collection useful for language model pretraining. The dataset and tools published with this paper offer researchers the opportunity to develop even better multilingual language models.
Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation
Abstract
In this paper, we present an algorithmic study on how to surpass competitors in popularity by strategic promotions in social networks. We first propose a novel model, in which we integrate the Preferential Attachment (PA) model for popularity growth with the Independent Cascade (IC) model for influence propagation in social networks called PA-IC model. In PA-IC, a popular item and a novice item grab shares of popularity from the natural popularity growth via the PA model, while the novice item tries to gain extra popularity via influence cascade in a social network. The {\em popularity ratio} is defined as the ratio of the popularity measure between the novice item and the popular item. We formulate {\em Popularity Ratio Maximization (PRM)} as the problem of selecting seeds in multiple rounds to maximize the popularity ratio in the end. We analyze the popularity ratio and show that it is monotone but not submodular. To provide an effective solution, we devise a surrogate objective function and show that empirically it is very close to the original objective function while theoretically, it is monotone and submodular. We design two efficient algorithms, one for the overlapping influence and non-overlapping seeds (across rounds) setting and the other for the non-overlapping influence and overlapping seed setting, and further discuss how to deal with other models and problem variants. Our empirical evaluation further demonstrates that the proposed PRM-IMM method consistently achieves the best popularity promotion compared to other methods. Our theoretical and empirical analyses shed light on the interplay between influence maximization and preferential attachment in social networks.
Hierarchical and Decentralised Federated Learning
Authors: Omer Rana, Theodoros Spyridopoulos, Nathaniel Hudson, Matt Baughman, Kyle Chard, Ian Foster, Aftab Khan
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Federated learning has shown enormous promise as a way of training ML models in distributed environments while reducing communication costs and protecting data privacy. However, the rise of complex cyber-physical systems, such as the Internet-of-Things, presents new challenges that are not met with traditional FL methods. Hierarchical Federated Learning extends the traditional FL process to enable more efficient model aggregation based on application needs or characteristics of the deployment environment (e.g., resource capabilities and/or network connectivity). It illustrates the benefits of balancing processing across the cloud-edge continuum. Hierarchical Federated Learning is likely to be a key enabler for a wide range of applications, such as smart farming and smart energy management, as it can improve performance and reduce costs, whilst also enabling FL workflows to be deployed in environments that are not well-suited to traditional FL. Model aggregation algorithms, software frameworks, and infrastructures will need to be designed and implemented to make such solutions accessible to researchers and engineers across a growing set of domains. H-FL also introduces a number of new challenges. For instance, there are implicit infrastructural challenges. There is also a trade-off between having generalised models and personalised models. If there exist geographical patterns for data (e.g., soil conditions in a smart farm likely are related to the geography of the region itself), then it is crucial that models used locally can consider their own locality in addition to a globally-learned model. H-FL will be crucial to future FL solutions as it can aggregate and distribute models at multiple levels to optimally serve the trade-off between locality dependence and global anomaly robustness.
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Authors: Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. Those explanations are expensive to compute and unable to comprehensively explain the model's output. Therefore, these models often require some sort of approximation that eventually leads to misleading explanations. We develop a framework based on SHAP, that allows for generating comprehensive, meaningful explanations leveraging the meaning representation of the output sequence as a whole. Moreover, by exploiting semantic priors in the visual backbone, we extract an arbitrary number of features that allows the efficient computation of Shapley values on large-scale models, generating at the same time highly meaningful visual explanations. We demonstrate that our method generates semantically more expressive explanations than traditional methods at a lower compute cost and that it can be generalized over other explainability methods.
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards
Abstract
We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^(1-\mu^) K T \ln K} + K \ln T)$, where $\mu^*$ is the expected reward of the optimal arm, and $T$ is the time horizon length.
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs
Authors: George Pu, Anirudh Jain, Jihan Yin, Russell Kaplan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
As foundation models continue to exponentially scale in size, efficient methods of adaptation become increasingly critical. Parameter-efficient fine-tuning (PEFT), a recent class of techniques that require only modifying a small percentage of the model parameters, is currently the most popular method for adapting large language models (LLMs). Several PEFT techniques have recently been proposed with varying tradeoffs. We provide a comprehensive and uniform benchmark of various PEFT techniques across a representative LLM, the FLAN-T5 model, and evaluate model performance across different data scales of classification and generation datasets. Based on this, we provide a framework for choosing the optimal fine-tuning techniques given the task type and data availability. Contrary to popular belief, we also empirically prove that PEFT techniques converge slower than full tuning in low data scenarios, and posit the amount of data required for PEFT methods to both perform well and converge efficiently. Lastly, we further optimize these PEFT techniques by selectively choosing which parts of the model to train, and find that these techniques can be applied with significantly fewer parameters while maintaining and even improving performance.
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Abstract
How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters (e.g., norm, bias and scale), which distribute the instruction-following ability across the entire LLaMA model besides adapters. Secondly, we propose an early fusion strategy to feed visual tokens only into the early LLM layers, contributing to better visual knowledge incorporation. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced by optimizing disjoint groups of learnable parameters. This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset. During inference, we incorporate additional expert models (e.g. captioning/OCR systems) into LLaMA-Adapter to further enhance its image understanding capability without incurring training costs. Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA. The newly designed framework also exhibits stronger language-only instruction-following capabilities and even excels in chat interactions. Our code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter.
Keyword: faster
Robust and Fast Vehicle Detection using Augmented Confidence Map
Authors: Hamam Mokayed, Palaiahnakote Shivakumara, Lama Alkhaled, Rajkumar Saini, Muhammad Zeshan Afzal, Yan Chai Hum, Marcus Liwicki
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Vehicle detection in real-time scenarios is challenging because of the time constraints and the presence of multiple types of vehicles with different speeds, shapes, structures, etc. This paper presents a new method relied on generating a confidence map-for robust and faster vehicle detection. To reduce the adverse effect of different speeds, shapes, structures, and the presence of several vehicles in a single image, we introduce the concept of augmentation which highlights the region of interest containing the vehicles. The augmented map is generated by exploring the combination of multiresolution analysis and maximally stable extremal regions (MR-MSER). The output of MR-MSER is supplied to fast CNN to generate a confidence map, which results in candidate regions. Furthermore, unlike existing models that implement complicated models for vehicle detection, we explore the combination of a rough set and fuzzy-based models for robust vehicle detection. To show the effectiveness of the proposed method, we conduct experiments on our dataset captured by drones and on several vehicle detection benchmark datasets, namely, KITTI and UA-DETRAC. The results on our dataset and the benchmark datasets show that the proposed method outperforms the existing methods in terms of time efficiency and achieves a good detection rate.
Moccasin: Efficient Tensor Rematerialization for Neural Networks
Abstract
The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.
Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot
Authors: Santiago Márquez Solís
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This research article presents various design patterns for improving interoperability in Polkadot, a blockchain platform. These patterns include chain bridges, interoperability standards, common asset identifiers, governance agreements, oracle chains, and a hypothetical design pattern called Zero Trust Chain. Implementation of these design patterns can help improve security and confidence in transactions between different chains on the Polkadot network, allowing for faster and more efficient communication. The article also emphasizes the importance of interoperability in blockchain technology and highlights Polkadot's flexibility in creating customized specialized chains that can further improve interoperability on the network. Overall, this article highlights how design patterns can improve interoperability in Polkadot, which could lead to greater adoption of blockchain technology in various industries.
SFD2: Semantic-guided Feature Detection and Description
Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Visual localization is a fundamental task for various applications including autonomous driving and robotics. Prior methods focus on extracting large amounts of often redundant locally reliable features, resulting in limited efficiency and accuracy, especially in large-scale environments under challenging conditions. Instead, we propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes. Specifically, our semantic-aware detector is able to detect keypoints from reliable regions (e.g. building, traffic lane) and suppress unreliable areas (e.g. sky, car) implicitly instead of relying on explicit semantic labels. This boosts the accuracy of keypoint matching by reducing the number of features sensitive to appearance changes and avoiding the need of additional segmentation networks at test time. Moreover, our descriptors are augmented with semantics and have stronger discriminative ability, providing more inliers at test time. Particularly, experiments on long-term large-scale visual localization Aachen Day-Night and RobotCar-Seasons datasets demonstrate that our model outperforms previous local features and gives competitive accuracy to advanced matchers but is about 2 and 3 times faster when using 2k and 4k keypoints, respectively.
Keyword: mobile
MWaste: A Deep Learning Approach to Manage Household Waste
Abstract
Computer vision methods have shown to be effective in classifying garbage into recycling categories for waste processing, existing methods are costly, imprecise, and unclear. To tackle this issue, we introduce MWaste, a mobile application that uses computer vision and deep learning techniques to classify waste materials as trash, plastic, paper, metal, glass or cardboard. Its effectiveness was tested on various neural network architectures and real-world images, achieving an average precision of 92\% on the test set. This app can help combat climate change by enabling efficient waste processing and reducing the generation of greenhouse gases caused by incorrect waste disposal.
Mobile Network Slicing under Demand Uncertainty: A Stochastic Programming Approach
Authors: Anousheh Gholami, Nariman Torkzaban, John S. Baras
Abstract
Network slicing enables the deployment of multiple dedicated virtual sub-networks, i.e. slices on a shared physical infrastructure. Unlike traditional one-size-fits-all resource provisioning schemes, each network slice (NS) in 5G is tailored to the specific service requirements of a group of customers. An end-to-end (E2E) mobile NS orchestration requires the simultaneous provisioning of computing, storage, and networking resources across the core network (CN) and the radio access network (RAN). Constant temporospatial changes in mobile user demand profiles further complicate the E2E NSs resource provisioning beyond the limits of the existing best-effort schemes that are only effective under accurate demand forecasts for all slices. This paper proposes a practical two-time-scale resource provisioning framework for E2E network slicing under demand uncertainty. At each macro-scale instance, we assume that only the spatial probability distribution of the NS demands is available. We formulate the NSs resource allocation problem as a stochastic mixed integer program (SMIP) with the objective of minimizing the total resource cost at the CN and the RAN. At each microscale instance, utilizing the exact slice demand profiles, a linear program is solved to jointly minimize the unsupported traffic and the resource cost at the RAN. We verify the effectiveness of our resource allocation scheme through numerical experiments.
LNMesh: Who Said You need Internet to send Bitcoin? Offline Lightning Network Payments using Community Wireless Mesh Networks
Authors: Ahmet Kurt, Abdulhadi Sahin, Ricardo Harrilal-Parchment, Kemal Akkaya
Abstract
Bitcoin is undoubtedly a great alternative to today's existing digital payment systems. Even though Bitcoin's scalability has been debated for a long time, we see that it is no longer a concern thanks to its layer-2 solution Lightning Network (LN). LN has been growing non-stop since its creation and enabled fast, cheap, anonymous, censorship-resistant Bitcoin transactions. However, as known, LN nodes need an active Internet connection to operate securely which may not be always possible. For example, in the aftermath of natural disasters or power outages, users may not have Internet access for a while. Thus, in this paper, we propose LNMesh which enables offline LN payments on top of wireless mesh networks. Users of a neighborhood or a community can establish a wireless mesh network to use it as an infrastructure to enable offline LN payments when they do not have any Internet connection. As such, we first present proof-of-concept implementations where we successfully perform offline LN payments utilizing Bluetooth Low Energy and WiFi. For larger networks with more users where users can also move around, channel assignments in the network need to be made strategically and thus, we propose 1) minimum connected dominating set; and 2) uniform spanning tree based channel assignment approaches. Finally, to test these approaches, we implemented a simulator in Python along with the support of BonnMotion mobility tool. We then extensively tested the performance metrics of large-scale realistic offline LN payments on mobile wireless mesh networks. Our simulation results show that, success rates up to %95 are achievable with the proposed channel assignment approaches when channels have enough liquidity.
Caught in the Game: On the History and Evolution of Web Browser Gaming
Abstract
Web browsers have come a long way since their inception, evolving from a simple means of displaying text documents over the network to complex software stacks with advanced graphics and network capabilities. As personal computers grew in popularity, developers jumped at the opportunity to deploy cross-platform games with centralized management and a low barrier to entry. Simply going to the right address is now enough to start a game. From text-based to GPU-powered 3D games, browser gaming has evolved to become a strong alternative to traditional console and mobile-based gaming, targeting both casual and advanced gamers. Browser technology has also evolved to accommodate more demanding applications, sometimes even supplanting functions typically left to the operating system. Today, websites display rich, computationally intensive, hardware-accelerated graphics, allowing developers to build ever-more impressive applications and games.In this paper, we present the evolution of browser gaming and the technologies that enabled it, from the release of the first text-based games in the early 1990s to current open-world and game-engine-powered browser games. We discuss the societal impact of browser gaming and how it has allowed a new target audience to accessdigital gaming. Finally, we review the potential future evolution ofthe browser gaming industry.
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
Authors: Johannes Czech, Jannis Blüml, Kristian Kersting
Abstract
While transformers have gained the reputation as the "Swiss army knife of AI", no one has challenged them to master the game of chess, one of the classical AI benchmarks. Simply using vision transformers (ViTs) within AlphaZero does not master the game of chess, mainly because ViTs are too slow. Even making them more efficient using a combination of MobileNet and NextViT does not beat what actually matters: a simple change of the input representation and value loss, resulting in a greater boost of up to 180 Elo points over AlphaZero.
An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours
Abstract
Congestion in traffic is an unavoidable circumstance in many cities in India and other countries. It is an issue of major concern. The steep rise in the number of automobiles on the roads followed by old infrastructure, accidents, pedestrian traffic, and traffic rule violations all add to challenging traffic conditions. Given these poor conditions of traffic, there is a critical need for automatically detecting and signaling systems. There are already various technologies that are used for traffic management and signaling systems like video analysis, infrared sensors, and wireless sensors. The main issue with these methods is they are very costly and high maintenance is required. In this paper, we have proposed a three-phase system that can guide emergency vehicles and manage traffic based on the degree of congestion. In the first phase, the system processes the captured images and calculates the Index value which is used to discover the degree of congestion. The Index value of a particular road depends on its width and the length up to which the camera captures images of that road. We have to take input for the parameters (length and width) while setting up the system. In the second phase, the system checks whether there are any emergency vehicles present or not in any lane. In the third phase, the whole processing and decision-making part is performed at the edge server. The proposed model is robust and it takes into consideration adverse weather conditions such as hazy, foggy, and windy. It works very efficiently in low light conditions also. The edge server is a strategically placed server that provides us with low latency and better connectivity. Using Edge technology in this traffic management system reduces the strain on cloud servers and the system becomes more reliable in real-time because the latency and bandwidth get reduced due to processing at the intermediate edge server.
Keyword: pruning
There is no result
Keyword: voxel
There is no result
Keyword: lidar
HyperMODEST: Self-Supervised 3D Object Detection with Confidence Score Filtering
Authors: Jenny Xu, Steven L. Waslander
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Current LiDAR-based 3D object detectors for autonomous driving are almost entirely trained on human-annotated data collected in specific geographical domains with specific sensor setups, making it difficult to adapt to a different domain. MODEST is the first work to train 3D object detectors without any labels. Our work, HyperMODEST, proposes a universal method implemented on top of MODEST that can largely accelerate the self-training process and does not require tuning on a specific dataset. We filter intermediate pseudo-labels used for data augmentation with low confidence scores. On the nuScenes dataset, we observe a significant improvement of 1.6% in AP BEV in 0-80m range at IoU=0.25 and an improvement of 1.7% in AP BEV in 0-80m range at IoU=0.5 while only using one-fifth of the training time in the original approach by MODEST. On the Lyft dataset, we also observe an improvement over the baseline during the first round of iterative self-training. We explore the trade-off between high precision and high recall in the early stage of the self-training process by comparing our proposed method with two other score filtering methods: confidence score filtering for pseudo-labels with and without static label retention. The code and models of this work are available at https://github.com/TRAILab/HyperMODEST
Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration
Authors: Alexander Kyuroson, Niklas Dahlquist, Nikolaos Stathoulopoulos, Vignesh Kottayam Viswanathan, Anton Koval, George Nikolakopoulos
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Algorithms for autonomous navigation in environments without Global Navigation Satellite System (GNSS) coverage mainly rely on onboard perception systems. These systems commonly incorporate sensors like cameras and LiDARs, the performance of which may degrade in the presence of aerosol particles. Thus, there is a need of fusing acquired data from these sensors with data from RADARs which can penetrate through such particles. Overall, this will improve the performance of localization and collision avoidance algorithms under such environmental conditions. This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles. A detailed description of the onboard sensors and the environment, where the dataset is collected are presented to enable full evaluation of acquired data. Furthermore, the dataset contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format to facilitate the evaluation of navigation, and localization algorithms in such environments. In contrast to the existing datasets, the focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data. Therefore, to validate the dataset, a preliminary comparison of odometry from onboard LiDARs is presented.
Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving
Authors: Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao, Dongfang Liu, Michael Zuzak, Xiangyu Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
Multi-sensor fusion (MSF) is widely adopted for perception in autonomous vehicles (AVs), particularly for the task of 3D object detection with camera and LiDAR sensors. The rationale behind fusion is to capitalize on the strengths of each modality while mitigating their limitations. The exceptional and leading performance of fusion models has been demonstrated by advanced deep neural network (DNN)-based fusion techniques. Fusion models are also perceived as more robust to attacks compared to single-modal ones due to the redundant information in multiple modalities. In this work, we challenge this perspective with single-modal attacks that targets the camera modality, which is considered less significant in fusion but more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches. Our approach employs a two-stage optimization-based strategy that first comprehensively assesses vulnerable image areas under adversarial attacks, and then applies customized attack strategies to different fusion models, generating deployable patches. Evaluations with five state-of-the-art camera-LiDAR fusion models on a real-world dataset show that our attacks successfully compromise all models. Our approach can either reduce the mean average precision (mAP) of detection performance from 0.824 to 0.353 or degrade the detection score of the target object from 0.727 to 0.151 on average, demonstrating the effectiveness and practicality of our proposed attack framework.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Authors: Junge Zhang, Feihu Zhang, Shaochen Kuang, Li Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data.
Keyword: diffusion
Learning a Diffusion Prior for NeRFs
Authors: Guandao Yang, Abhijit Kundu, Leonidas J. Guibas, Jonathan T. Barron, Ben Poole
Abstract
Neural Radiance Fields (NeRFs) have emerged as a powerful neural 3D representation for objects and scenes derived from 2D data. Generating NeRFs, however, remains difficult in many scenarios. For instance, training a NeRF with only a small number of views as supervision remains challenging since it is an under-constrained problem. In such settings, it calls for some inductive prior to filter out bad local minima. One way to introduce such inductive priors is to learn a generative model for NeRFs modeling a certain class of scenes. In this paper, we propose to use a diffusion model to generate NeRFs encoded on a regularized grid. We show that our model can sample realistic NeRFs, while at the same time allowing conditional generations, given a certain observation as guidance.
It is all about where you start: Text-to-image generation with seed selection
Authors: Dvir Samuel, Rami Ben-Ari, Simon Raviv, Nir Darshan, Gal Chechik
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Text-to-image diffusion models can synthesize a large variety of concepts in new compositions and scenarios. However, they still struggle with generating uncommon concepts, rare unusual combinations, or structured concepts like hand palms. Their limitation is partly due to the long-tail nature of their training data: web-crawled data sets are strongly unbalanced, causing models to under-represent concepts from the tail of the distribution. Here we characterize the effect of unbalanced training data on text-to-image models and offer a remedy. We show that rare concepts can be correctly generated by carefully selecting suitable generation seeds in the noise space, a technique that we call SeedSelect. SeedSelect is efficient and does not require retraining the diffusion model. We evaluate the benefit of SeedSelect on a series of problems. First, in few-shot semantic data augmentation, where we generate semantically correct images for few-shot and long-tail benchmarks. We show classification improvement on all classes, both from the head and tail of the training data of diffusion models. We further evaluate SeedSelect on correcting images of hands, a well-known pitfall of current diffusion models, and show that it improves hand generation substantially.
SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis
Abstract
Text-conditioned image generation has made significant progress in recent years with generative adversarial networks and more recently, diffusion models. While diffusion models conditioned on text prompts have produced impressive and high-quality images, accurately representing complex text prompts such as the number of instances of a specific object remains challenging. To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data. Through a novel loss in the sampling process, our approach guides the model with semantic features from CLIP embeddings and enforces geometric constraints, leading to high-resolution images that accurately represent the scene. To obtain bounding box and segmentation map information, we structure the text prompt as a scene graph and enrich the nodes with CLIP embeddings. Our proposed model achieves state-of-the-art performance on two public benchmarks for image generation from scene graphs, surpassing both scene graph to image and text-based diffusion models in various metrics. Our results demonstrate the effectiveness of incorporating bounding box and segmentation map guidance in the diffusion model sampling process for more accurate text-to-image generation.
MUDiff: Unified Diffusion for Complete Molecule Generation
Authors: Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup
Abstract
We present a new model for generating molecular data by combining discrete and continuous diffusion processes. Our model generates a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and the ability to explore the effect of different factors on molecular structures and properties. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer is equivariant to Euclidean transformations, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules with good properties. Our model is a promising approach for designing molecules with desired properties and can be applied to a wide range of tasks in molecular modeling.
Keyword: dynamic
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation
Abstract
Inaccurate optical flow estimates in and near occluded regions, and out-of-boundary regions are two of the current significant limitations of optical flow estimation algorithms. Recent state-of-the-art optical flow estimation algorithms are two-frame based methods where optical flow is estimated sequentially for each consecutive image pair in a sequence. While this approach gives good flow estimates, it fails to generalize optical flows in occluded regions mainly due to limited local evidence regarding moving elements in a scene. In this work, we propose a learning-based multi-frame optical flow estimation method that estimates two or more consecutive optical flows in parallel from multi-frame image sequences. Our underlying hypothesis is that by understanding temporal scene dynamics from longer sequences with more than two frames, we can characterize pixel-wise dependencies in a larger spatiotemporal domain, generalize complex motion patterns and thereby improve the accuracy of optical flow estimates in occluded regions. We present learning-based spatiotemporal recurrent transformers for multi-frame based optical flow estimation (SSTMs). Our method utilizes 3D Convolutional Gated Recurrent Units (3D-ConvGRUs) and spatiotemporal transformers to learn recurrent space-time motion dynamics and global dependencies in the scene and provide a generalized optical flow estimation. When compared with recent state-of-the-art two-frame and multi-frame methods on real world and synthetic datasets, performance of the SSTMs were significantly higher in occluded and out-of-boundary regions. Among all published state-of-the-art multi-frame methods, SSTM achieved state-of the-art results on the Sintel Final and KITTI2015 benchmark datasets.
One-Step Distributional Reinforcement Learning
Authors: Mastane Achab, Reda Alami, Yasser Abdelaziz Dahou Djilali, Kirill Fedyanin, Eric Moulines
Abstract
Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the agent goes beyond the limit of the expected value, to capture the underlying probability distribution of the return across all time steps. The set of DistrRL algorithms has led to improved empirical performance. Nevertheless, the theory of DistrRL is still not fully understood, especially in the control case. In this paper, we present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework encompassing only the randomness induced by the one-step dynamics of the environment. Contrary to DistrRL, we show that our approach comes with a unified theory for both policy evaluation and control. Indeed, we propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis. The proposed approach compares favorably with categorical DistrRL on various environments.
MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling
Authors: Yicun Huang, Changfu Zou, Yang Li, Torsten Wik
Abstract
The concept of integrating physics-based and data-driven approaches has become popular for modeling sustainable energy systems. However, the existing literature mainly focuses on the data-driven surrogates generated to replace physics-based models. These models often trade accuracy for speed but lack the generalisability, adaptability, and interpretability inherent in physics-based models, which are often indispensable in the modeling of real-world dynamic systems for optimization and control purposes. In this work, we propose a novel architecture for generating model-integrated neural networks (MINN) to allow integration on the level of learning physics-based dynamics of the system. The obtained hybrid model solves an unsettled research problem in control-oriented modeling, i.e., how to obtain an optimally simplified model that is physically insightful, numerically accurate, and computationally tractable simultaneously. We apply the proposed neural network architecture to model the electrochemical dynamics of lithium-ion batteries and show that MINN is extremely data-efficient to train while being sufficiently generalizable to previously unseen input data, owing to its underlying physical invariants. The MINN battery model has an accuracy comparable to the first principle-based model in predicting both the system outputs and any locally distributed electrochemical behaviors but achieves two orders of magnitude reduction in the solution time.
Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors
Authors: Héctor Martínez, Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.
Deep state-space modeling for explainable representation, analysis, and generation of professional human poses
Authors: Brenda Elizabeth Olivas-Padilla, Sotiris Manitsaris
Abstract
The analysis of human movements has been extensively studied due to its wide variety of practical applications. Nevertheless, the state-of-the-art still faces scientific challenges while modeling human movements. Firstly, new models that account for the stochasticity of human movement and the physical structure of the human body are required to accurately predict the evolution of full-body motion descriptors over time. Secondly, the explainability of existing deep learning algorithms regarding their body posture predictions while generating human movements still needs to be improved as they lack comprehensible representations of human movement. This paper addresses these challenges by introducing three novel approaches for creating explainable representations of human movement. In this work, full-body movement is formulated as a state-space model of a dynamic system whose parameters are estimated using deep learning and statistical algorithms. The representations adhere to the structure of the Gesture Operational Model (GOM), which describes movement through its spatial and temporal assumptions. Two approaches correspond to deep state-space models that apply nonlinear network parameterization to provide interpretable posture predictions. The third method trains GOM representations using one-shot training with Kalman Filters. This training strategy enables users to model single movements and estimate their mathematical representation using procedures that require less computational power than deep learning algorithms. Ultimately, two applications of the generated representations are presented. The first is for the accurate generation of human movements, and the second is for body dexterity analysis of professional movements, where dynamic associations between body joints and meaningful motion descriptors are identified.
pyBibX -- A Python Library for Bibliometric and Scientometric Analysis Powered with Artificial Intelligence Tools
Authors: Valdecy Pereira, Marcio Pereira Basilio, Carlos Henrique Tarjano Santos
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI)
Abstract
Bibliometric and Scientometric analyses offer invaluable perspectives on the complex research terrain and collaborative dynamics spanning diverse academic disciplines. This paper presents pyBibX, a python library devised to conduct comprehensive bibliometric and scientometric analyses on raw data files sourced from Scopus, Web of Science, and PubMed, seamlessly integrating state of the art AI capabilities into its core functionality. The library executes a comprehensive EDA, presenting outcomes via visually appealing graphical illustrations. Network capabilities have been deftly integrated, encompassing Citation, Collaboration, and Similarity Analysis. Furthermore, the library incorporates AI capabilities, including Embedding vectors, Topic Modeling, Text Summarization, and other general Natural Language Processing tasks, employing models such as Sentence-BERT, BerTopic, BERT, chatGPT, and PEGASUS. As a demonstration, we have analyzed 184 documents associated with multiple-criteria decision analysis published between 1984 and 2023. The EDA emphasized a growing fascination with decision-making and fuzzy logic methodologies. Next, Network Analysis further accentuated the significance of central authors and intra-continental collaboration, identifying Canada and China as crucial collaboration hubs. Finally, AI Analysis distinguished two primary topics and chatGPT preeminence in Text Summarization. It also proved to be an indispensable instrument for interpreting results, as our library enables researchers to pose inquiries to chatGPT regarding bibliometric outcomes. Even so, data homogeneity remains a daunting challenge due to database inconsistencies. PyBibX is the first application integrating cutting-edge AI capabilities for analyzing scientific publications, enabling researchers to examine and interpret these outcomes more effectively.
Ensemble Modeling with Contrastive Knowledge Distillation for Sequential Recommendation
Authors: Hanwen Du, Huanhuan Yuan, Pengpeng Zhao, Fuzhen Zhuang, Guanfeng Liu, Lei Zhao, Yanchi Liu, Victor S. Sheng
Abstract
Sequential recommendation aims to capture users' dynamic interest and predicts the next item of users' preference. Most sequential recommendation methods use a deep neural network as sequence encoder to generate user and item representations. Existing works mainly center upon designing a stronger sequence encoder. However, few attempts have been made with training an ensemble of networks as sequence encoders, which is more powerful than a single network because an ensemble of parallel networks can yield diverse prediction results and hence better accuracy. In this paper, we present Ensemble Modeling with contrastive Knowledge Distillation for sequential recommendation (EMKD). Our framework adopts multiple parallel networks as an ensemble of sequence encoders and recommends items based on the output distributions of all these networks. To facilitate knowledge transfer between parallel networks, we propose a novel contrastive knowledge distillation approach, which performs knowledge transfer from the representation level via Intra-network Contrastive Learning (ICL) and Cross-network Contrastive Learning (CCL), as well as Knowledge Distillation (KD) from the logits level via minimizing the Kullback-Leibler divergence between the output distributions of the teacher network and the student network. To leverage contextual information, we train the primary masked item prediction task alongside the auxiliary attribute prediction task as a multi-task learning scheme. Extensive experiments on public benchmark datasets show that EMKD achieves a significant improvement compared with the state-of-the-art methods. Besides, we demonstrate that our ensemble method is a generalized approach that can also improve the performance of other sequential recommenders. Our code is available at this link: https://github.com/hw-du/EMKD.
NeuralKG-ind: A Python Library for Inductive Knowledge Graph Representation Learning
Abstract
Since the dynamic characteristics of knowledge graphs, many inductive knowledge graph representation learning (KGRL) works have been proposed in recent years, focusing on enabling prediction over new entities. NeuralKG-ind is the first library of inductive KGRL as an important update of NeuralKG library. It includes standardized processes, rich existing methods, decoupled modules, and comprehensive evaluation metrics. With NeuralKG-ind, it is easy for researchers and engineers to reproduce, redevelop, and compare inductive KGRL methods. The library, experimental methodologies, and model re-implementing results of NeuralKG-ind are all publicly released at https://github.com/zjukg/NeuralKG/tree/ind .
Metric Temporal Equilibrium Logic over Timed Traces
Authors: Arvid Becker, Pedro Cabalar, Martín Diéguez, Torsten Schaub, Anna Schuhmann
Abstract
In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. However, timing constraints are important in many applications like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time temporal equilibrium logic, in which temporal operators are constrained by intervals over natural numbers. The resulting Metric Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. To this end, we define a translation of metric formulas into monadic first-order formulas and give a correspondence between their models in Metric Equilibrium Logic and Monadic Quantified Equilibrium Logic, respectively. Interestingly, our translation provides a blue print for implementation in terms of ASP modulo difference constraints.
Regret Optimal Control for Uncertain Stochastic Systems
Authors: Andrea Martin, Luca Furieri, Florian Dörfler, John Lygeros, Giancarlo Ferrari-Trecate
Abstract
We consider control of uncertain linear time-varying stochastic systems from the perspective of regret minimization. Specifically, we focus on the problem of designing a feedback controller that minimizes the loss relative to a clairvoyant optimal policy that has foreknowledge of the system dynamics and the exogenous disturbances. In this competitive framework, establishing robustness guarantees proves challenging as, differently from the case where the model is known, the benchmark policy is not only inapplicable, but also impossible to compute without knowledge of the system parameters. To overcome this issue, we embrace a scenario optimization approach, and we propose minimizing regret robustly over a finite set of randomly sampled system parameters. We prove that this policy optimization problem can be efficiently solved through semidefinite programming, and that the corresponding solution retains strong probabilistic out-of-sample regret guarantees in face of the uncertain dynamics. Our method naturally extends to include satisfaction of safety constraints with high probability. We validate our theoretical results and showcase the potential of our approach by means of numerical simulations.
IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an \textbf{e}fficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers. Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
Abstract
Path planning is a classic problem for autonomous robots. To ensure safe and efficient point-to-point navigation an appropriate algorithm should be chosen keeping the robot's dimensions and its classification in mind. Autonomous robots use path-planning algorithms to safely navigate a dynamic, dense, and unknown environment. A few metrics for path planning algorithms to be taken into account are safety, efficiency, lowest-cost path generation, and obstacle avoidance. Before path planning can take place we need map representation which can be discretized or open configuration space. Discretized configuration space provides node/connectivity information from one point to another. While in open/free configuration space it is up to the algorithm to create a list of nodes and then find a feasible path. Both types of maps are populated by obstacle positions using perception obstacle detection techniques to represent current obstacles from the perspective of the robot. For open configuration spaces, sampling-based planning algorithms are used. This paper aims to explore various types of Sampling-based path-planning algorithms such as Probabilistic RoadMap (PRM), and Rapidly-exploring Random Trees (RRT). These two algorithms also have optimized versions - PRM and RRT and this paper discusses how that optimization is achieved and is beneficial.
MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition
Authors: Shengchao Chen, Ting Shu, Huan Zhao, Yuan Yan Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Weather recognition is an essential support for many practical life applications, including traffic safety, environment, and meteorology. However, many existing related works cannot comprehensively describe weather conditions due to their complex co-occurrence dependencies. This paper proposes a novel multi-label weather recognition model considering these dependencies. The proposed model called MASK-Convolutional Neural Network-Transformer (MASK-CT) is based on the Transformer, the convolutional process, and the MASK mechanism. The model employs multiple convolutional layers to extract features from weather images and a Transformer encoder to calculate the probability of each weather condition based on the extracted features. To improve the generalization ability of MASK-CT, a MASK mechanism is used during the training phase. The effect of the MASK mechanism is explored and discussed. The Mask mechanism randomly withholds some information from one-pair training instances (one image and its corresponding label). There are two types of MASK methods. Specifically, MASK-I is designed and deployed on the image before feeding it into the weather feature extractor and MASK-II is applied to the image label. The Transformer encoder is then utilized on the randomly masked image features and labels. The experimental results from various real-world weather recognition datasets demonstrate that the proposed MASK-CT model outperforms state-of-the-art methods. Furthermore, the high-speed dynamic real-time weather recognition capability of the MASK-CT is evaluated.
Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models
Abstract
Neural ranking models (NRMs) have attracted considerable attention in information retrieval. Unfortunately, NRMs may inherit the adversarial vulnerabilities of general neural networks, which might be leveraged by black-hat search engine optimization practitioners. Recently, adversarial attacks against NRMs have been explored in the paired attack setting, generating an adversarial perturbation to a target document for a specific query. In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic. We define both static and dynamic settings for the task and focus on decision-based black-box attacks. We propose a novel framework to improve topic-oriented attack performance based on a surrogate ranking model. The attack problem is formalized as a Markov decision process (MDP) and addressed using reinforcement learning. Specifically, a topic-oriented reward function guides the policy to find a successful adversarial example that can be promoted in rankings to as many queries as possible in a group. Experimental results demonstrate that the proposed framework can significantly outperform existing attack strategies, and we conclude by re-iterating that there exist potential risks for applying NRMs in the real world.
A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform
Authors: Tobias Long, Robert Barnett, Richard Jefferson-Loveday, Giovanni Stabile, Matteo Icardi
Abstract
Problems with dominant advection, discontinuities, travelling features, or shape variations are widespread in computational mechanics. However, classical linear model reduction and interpolation methods typically fail to reproduce even relatively small parameter variations, making the reduced models inefficient and inaccurate. In this work a novel reduced order modelling approach is proposed based on the Radon-Cumulative-Distribution transform (RCDT). We show that this non-linear transformation can significantly improve the dimensionality of proper orthogonal decomposition (POD) reconstructions and is capable of interpolating accurately some advection-dominated phenomena. The method is tested on various testcases in multiphase fluid dynamics.
A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks
Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson
Abstract
Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.
Maximizing Reachability Probabilities in Rectangular Automata with Random Clocks
Authors: Joanna Delicaris, Stefan Schupp, Erika Ábrahám, Anne Remke
Subjects: Formal Languages and Automata Theory (cs.FL)
Abstract
This paper proposes an algorithm to maximize reachability probabilities for rectangular automata with random clocks via a history-dependent prophetic scheduler. This model class incorporates time-induced nondeterminism on discrete behavior and nondeterminism in the dynamic behavior. After computing reachable state sets via a forward flowpipe construction, we use backward refinement to compute maximum reachability probabilities. The feasibility of the presented approach is illustrated on a scalable model.
Keyword: efficient
MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling
Model Explainability in Physiological and Healthcare-based Neural Networks
MWaste: A Deep Learning Approach to Manage Household Waste
SRCNet: Seminal Representation Collaborative Network for Marine Oil Spill Segmentation
Read My Mind: A Multi-Modal Dataset for Human Belief Prediction
Suspicious Vehicle Detection Using Licence Plate Detection And Facial Feature Recognition
An Efficient Ensemble Explainable AI (XAI) Approach for Morphed Face Detection
Visual Referential Games Further the Emergence of Disentangled Representations
Multivariate Representation Learning for Information Retrieval
It is all about where you start: Text-to-image generation with seed selection
Identifying Minimal Changes in the Zone Abstract Domain
Neural Implicit Dense Semantic SLAM
An Adaptive Channel Reservation MAC Protocol Based on Forwarding Traffic of Key Nodes
Learning adaptive manipulation of objects with revolute joint: A case study on varied cabinet doors opening
Timely Mobile Routing: An Experimental Study
Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening
DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration
An Adaptive Policy to Employ Sharpness-Aware Minimization
Effective Data Aggregation in WSN for Enhanced Security and Data Privacy
Client Recruitment for Federated Learning in ICU Length of Stay Prediction
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment
Quantum Cross Subspace Alignment Codes via the $N$-sum Box Abstraction
Graph Neural Networks on Factor Graphs for Robust, Fast, and Scalable Linear State Estimation with PMUs
Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot
FlowTransformer: A Transformer Framework for Flow-based Network Intrusion Detection Systems
Orthogonal polynomial bases in the Mixed Virtual Element Method
Hyperparameter Optimization through Neural Network Partitioning
MCPrioQ: A lock-free algorithm for online sparse markov-chains
Channel Orthogonalization with Reconfigurable Surfaces
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Earning Extra Performance from Restrictive Feedbacks
Regret Optimal Control for Uncertain Stochastic Systems
Sampling-based Path Planning Algorithms: A Survey
Wasserstein Dictionaries of Persistence Diagrams
Dense Hybrid Proposal Modulation for Lane Detection
A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform
Ensuring Reliable Robot Task Performance through Probabilistic Rare-Event Verification and Synthesis
The Power of Typed Affine Decision Structures: A Case Study
Model Predictive Control of Wind Turbines with Piecewise-Affine Power Coefficient Approximation
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours
An Empirical Study of Multimodal Model Merging
Information Redundancy and Biases in Public Document Information Extraction Benchmarks
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data
Popularity Ratio Maximization: Surpassing Competitors through Influence Propagation
Hierarchical and Decentralised Federated Learning
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Keyword: faster
Robust and Fast Vehicle Detection using Augmented Confidence Map
Moccasin: Efficient Tensor Rematerialization for Neural Networks
Zero Trust Chain A Design Pattern for Improved Interoperability and Security in Polkadot
SFD2: Semantic-guided Feature Detection and Description
Keyword: mobile
MWaste: A Deep Learning Approach to Manage Household Waste
Mobile Network Slicing under Demand Uncertainty: A Stochastic Programming Approach
LNMesh: Who Said You need Internet to send Bitcoin? Offline Lightning Network Payments using Community Wireless Mesh Networks
Caught in the Game: On the History and Evolution of Web Browser Gaming
Representation Matters: The Game of Chess Poses a Challenge to Vision Transformers
An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours
Keyword: pruning
There is no result
Keyword: voxel
There is no result
Keyword: lidar
HyperMODEST: Self-Supervised 3D Object Detection with Confidence Score Filtering
Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration
Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Keyword: diffusion
Learning a Diffusion Prior for NeRFs
It is all about where you start: Text-to-image generation with seed selection
SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis
MUDiff: Unified Diffusion for Complete Molecule Generation
Keyword: dynamic
SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation
One-Step Distributional Reinforcement Learning
MINN: Learning the dynamics of differential-algebraic equations and application to battery modeling
Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors
Deep state-space modeling for explainable representation, analysis, and generation of professional human poses
pyBibX -- A Python Library for Bibliometric and Scientometric Analysis Powered with Artificial Intelligence Tools
Ensemble Modeling with Contrastive Knowledge Distillation for Sequential Recommendation
NeuralKG-ind: A Python Library for Inductive Knowledge Graph Representation Learning
Metric Temporal Equilibrium Logic over Timed Traces
Regret Optimal Control for Uncertain Stochastic Systems
IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
Sampling-based Path Planning Algorithms: A Survey
MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition
Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models
A novel reduced-order model for advection-dominated problems based on Radon-Cumulative-Distribution Transform
A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks
Maximizing Reachability Probabilities in Rectangular Automata with Random Clocks