Abstract
Decades of research indicate that emotion recognition is more effective when drawing information from multiple modalities. But what if some modalities are sometimes missing? To address this problem, we propose a novel Transformer-based architecture for recognizing valence and arousal in a time-continuous manner even with missing input modalities. We use a coupling of cross-attention and self-attention mechanisms to emphasize relationships between modalities during time and enhance the learning process on weak salient inputs. Experimental results on the Ulm-TSST dataset show that our model exhibits an improvement of the concordance correlation coefficient evaluation of 37% when predicting arousal values and 30% when predicting valence values, compared to a late-fusion baseline approach.
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Authors: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generative models for 3D object synthesis have seen significant advancements with the incorporation of prior knowledge distilled from 2D diffusion models. Nevertheless, challenges persist in the form of multi-view geometric inconsistencies and slow generation speeds within the existing 3D synthesis frameworks. This can be attributed to two factors: firstly, the deficiency of abundant geometric a priori knowledge in optimization, and secondly, the entanglement issue between geometry and texture in conventional 3D generation methods.In response, we introduce MetaDreammer, a two-stage optimization approach that leverages rich 2D and 3D prior knowledge. In the first stage, our emphasis is on optimizing the geometric representation to ensure multi-view consistency and accuracy of 3D objects. In the second stage, we concentrate on fine-tuning the geometry and optimizing the texture, thereby achieving a more refined 3D object. Through leveraging 2D and 3D prior knowledge in two stages, respectively, we effectively mitigate the interdependence between geometry and texture. MetaDreamer establishes clear optimization objectives for each stage, resulting in significant time savings in the 3D generation process. Ultimately, MetaDreamer can generate high-quality 3D objects based on textual prompts within 20 minutes, and to the best of our knowledge, it is the most efficient text-to-3D generation method. Furthermore, we introduce image control into the process, enhancing the controllability of 3D generation. Extensive empirical evidence confirms that our method is not only highly efficient but also achieves a quality level that is at the forefront of current state-of-the-art 3D generation techniques.
Hypergraph-based Multi-robot Motion Planning with Topological Guidance
Authors: Courtney McBeth, James Motes, Marco Morales, Nancy M. Amato
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Abstract
We present a multi-robot motion planning algorithm that efficiently finds paths for robot teams up to ten times larger than existing methods in congested settings with narrow passages in the environment. Narrow passages represent a source of difficulty for sampling-based motion planning algorithms. This problem is exacerbated for multi-robot systems where the planner must also avoid inter-robot collisions within these congested spaces, requiring coordination. Topological guidance, which leverages information about the robot's environment, has been shown to improve performance for mobile robot motion planning in single robot scenarios with narrow passages. Additionally, our prior work has explored using topological guidance in multi-robot settings where a high degree of coordination is required of the full robot group. This high level of coordination, however, is not always necessary and results in excessive computational overhead for large groups. Here, we propose a novel multi-robot motion planning method that leverages topological guidance to inform the planner when coordination between robots is necessary, leading to a significant improvement in scalability.
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems
Abstract
Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence maximization. These applications heavily rely on BPTs to implement a Monte-Carlo sampling step for their approximations. Given the large sampling complexity, stochasticity of the diffusion process, and the inherent irregularity in real-world graph topologies, efficiently parallelizing these BPTs remains significantly challenging. In this paper, we present a new algorithm to fuse massive number of concurrently executing BPTs with random starts on the input graph. Our algorithm is designed to fuse BPTs by combining separate traversals into a unified frontier on distributed multi-GPU systems. To show the general applicability of the fused BPT technique, we have incorporated it into two state-of-the-art influence maximization parallel implementations (gIM and Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer ($32,768$ GPUs and $196$K CPU cores) show strong scaling behavior, and that fused BPTs can improve the performance of these implementations up to 34$\times$ (for gIM) and ~360$\times$ (for Ripples).
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication
Abstract
From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators implemented in the same technology. The hash function is a decision tree, which allows for an efficient hardware implementation as the multiply-accumulate operations are replaced by decision tree passes and LUT lookups. The entire Maddness MatMul can be broken down into parts that allow an effective implementation with small computing units and memories, allowing it to reach extreme efficiency while remaining generically applicable for MatMul tasks. In a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9.
Layer-to-Layer Melt Pool Control in Laser Power Bed Fusion
Authors: Dominic Liao-McPherson, Efe C. Balta, Mohamadreza Afrasiabi, Alisa Rupenyan, Markus Bambach, John Lygeros
Abstract
Additive manufacturing processes are flexible and efficient technologies for producing complex geometries. However, ensuring reliability and repeatability is challenging due to the complex physics and various sources of uncertainty in the process. In this work, we investigate closed-loop control of the melt pool dimensions in a laser powder bed fusion (LPBF) process. We propose a trajectory optimization-based layer-to-layer controller that adjusts the laser power input to the next layer to track a desired melt pool depth and validate our controller by placing it in closed-loop high-fidelity multi-layer smoothed particle hydrodynamics simulator of a 2D LPBF process. Detailed numerical case studies demonstrate successful regulation of the melt pool depth on brick and overhang geometries and provide first of its kind results on the effectiveness of layer-to-layer input optimization for the LPBF process as well as detailed insight into the physics of the controlled process. Computational complexity and process performance results illustrate the method's effectiveness and provide an outlook for its implementation onto real systems.
Asymptotically Fair Participation in Machine Learning Models: an Optimal Control Perspective
Abstract
The performance of state-of-the-art machine learning models often deteriorates when testing on demographics that are under-represented in the training dataset. This problem has predominately been studied in a supervised learning setting where the data distribution is static. However, real-world applications often involve distribution shifts caused by the deployed models. For instance, the performance disparity against monitory users can lead to a high customer churn rate, thus the available data provided by active users are skewed due to the lack of minority users. This feedback effect further exacerbates the disparity among different demographic groups in future steps. To address this issue, we propose asymptotically fair participation as a condition to maintain long-term model performance over all demographic groups. In this work, we aim to address the problem of achieving asymptotically fair participation via optimal control formulation. Moreover, we design a surrogate retention system based on existing literature on evolutionary population dynamics to approximate the dynamics of distribution shifts on active user counts, from which the objective of achieving asymptotically fair participation is formulated as an optimal control problem, and the control variables are considered as the model parameters. We apply an efficient implementation of Pontryagin's maximum principle to estimate the optimal control solution. To evaluate the effectiveness of the proposed method, we design a generic simulation environment that simulates the population dynamics of the feedback effect between user retention and model performance. When we deploy the resulting models to the simulation environment, the optimal control solution accounts for long-term planning and leads to superior performance compared with existing baseline methods.
Segment Anything in Defect Detection
Authors: Bozhen Hu, Bin Gao, Cheng Tan, Tongle Wu, Stan Z. Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Defect detection plays a crucial role in infrared non-destructive testing systems, offering non-contact, safe, and efficient inspection capabilities. However, challenges such as low resolution, high noise, and uneven heating in infrared thermal images hinder comprehensive and accurate defect detection. In this study, we propose DefectSAM, a novel approach for segmenting defects on highly noisy thermal images based on the widely adopted model, Segment Anything (SAM)\cite{kirillov2023segany}. Harnessing the power of a meticulously curated dataset generated through labor-intensive lab experiments and valuable prompts from experienced experts, DefectSAM surpasses existing state-of-the-art segmentation algorithms and achieves significant improvements in defect detection rates. Notably, DefectSAM excels in detecting weaker and smaller defects on complex and irregular surfaces, reducing the occurrence of missed detections and providing more accurate defect size estimations. Experimental studies conducted on various materials have validated the effectiveness of our solutions in defect detection, which hold significant potential to expedite the evolution of defect detection tools, enabling enhanced inspection capabilities and accuracy in defect identification.
FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems
Abstract
Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.
Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving
Abstract
Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an efficient, cheap, and portable solution for 3D object perception tasks. It can also be robust to different lighting or all-weather driving scenarios due to the capability of mmWave radars. In this paper, we introduce the CRUW3D dataset, including 66K synchronized and well-calibrated camera, radar, and LiDAR frames in various driving scenarios. Unlike other large-scale autonomous driving datasets, our radar data is in the format of radio frequency (RF) tensors that contain not only 3D location information but also spatio-temporal semantic information. This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar.
Multiscale Hodge Scattering Networks for Data Analysis
Authors: Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Abstract
We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \N$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kk$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
Telescope: Telemetry at Terabyte Scale
Authors: Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney
Subjects: Operating Systems (cs.OS); Hardware Architecture (cs.AR); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques.
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures
Abstract
Hyperdimensional computing (HDC) is an emerging computing paradigm with significant promise for efficient and robust learning. In HDC, objects are encoded with high-dimensional vector symbolic sequences called hypervectors. The quality of hypervectors, defined by their distribution and independence, directly impacts the performance of HDC systems. Despite a large body of work on the processing parts of HDC systems, little to no attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLABs or Pythons random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
Scalable Algorithms for Laplacian Pseudo-inverse Computation
Abstract
The pseudo-inverse of a graph Laplacian matrix, denoted as $L^\dagger$, finds extensive application in various graph analysis tasks. Notable examples include the calculation of electrical closeness centrality, determination of Kemeny's constant, and evaluation of resistance distance. However, existing algorithms for computing $L^\dagger$ are often computationally expensive when dealing with large graphs. To overcome this challenge, we propose novel solutions for approximating $L^\dagger$ by establishing a connection with the inverse of a Laplacian submatrix $L_v$. This submatrix is obtained by removing the $v$-th row and column from the original Laplacian matrix $L$. The key advantage of this connection is that $L_v^{-1}$ exhibits various interesting combinatorial interpretations. We present two innovative interpretations of $L_v^{-1}$ based on spanning trees and loop-erased random walks, which allow us to develop efficient sampling algorithms. Building upon these new theoretical insights, we propose two novel algorithms for efficiently approximating both electrical closeness centrality and Kemeny's constant. We extensively evaluate the performance of our algorithms on five real-life datasets. The results demonstrate that our novel approaches significantly outperform the state-of-the-art methods by several orders of magnitude in terms of both running time and estimation errors for these two graph analysis tasks. To further illustrate the effectiveness of electrical closeness centrality and Kemeny's constant, we present two case studies that showcase the practical applications of these metrics.
Leveraging Function Space Aggregation for Federated Learning at Scale
Abstract
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
Hierarchical Pruning of Deep Ensembles with Focal Diversity
Authors: Yanzhao Wu, Ka-Ho Chow, Wenqi Wei, Ling Liu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural network ensembles combine the wisdom of multiple deep neural networks to improve the generalizability and robustness over individual networks. It has gained increasing popularity to study deep ensemble techniques in the deep learning community. Some mission-critical applications utilize a large number of deep neural networks to form deep ensembles to achieve desired accuracy and resilience, which introduces high time and space costs for ensemble execution. However, it still remains a critical challenge whether a small subset of the entire deep ensemble can achieve the same or better generalizability and how to effectively identify these small deep ensembles for improving the space and time efficiency of ensemble execution. This paper presents a novel deep ensemble pruning approach, which can efficiently identify smaller deep ensembles and provide higher ensemble accuracy than the entire deep ensemble of a large number of member networks. Our hierarchical ensemble pruning approach (HQ) leverages three novel ensemble pruning techniques. First, we show that the focal diversity metrics can accurately capture the complementary capacity of the member networks of an ensemble, which can guide ensemble pruning. Second, we design a focal diversity based hierarchical pruning approach, which will iteratively find high quality deep ensembles with low cost and high accuracy. Third, we develop a focal diversity consensus method to integrate multiple focal diversity metrics to refine ensemble pruning results, where smaller deep ensembles can be effectively identified to offer high accuracy, high robustness and high efficiency. Evaluated using popular benchmark datasets, we demonstrate that the proposed hierarchical ensemble pruning approach can effectively identify high quality deep ensembles with better generalizability while being more time and space efficient in ensemble decision making.
Learning transformer-based heterogeneously salient graph representation for multimodal fusion classification of hyperspectral image and LiDAR data
Authors: Jiaqi Yang, Bo Du, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward is put forward in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses on three benchmark datasets with various state-of-the-art (SOTA) methods show the performance of the proposed approach.
Scalable Edge Clustering of Dynamic Graphs via Weighted Line Graphs
Authors: Michael Ostroski, Geoffrey Sanders, Trevor Steil, Roger Pearce
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)
Abstract
Timestamped relational datasets consisting of records between pairs of entities are ubiquitous in data and network science. For applications like peer-to-peer communication, email, social network interactions, and computer network security, it makes sense to organize these records into groups based on how and when they are occurring. Weighted line graphs offer a natural way to model how records are related in such datasets but for large real-world graph topologies the complexity of building and utilizing the line graph is prohibitive. We present an algorithm to cluster the edges of a dynamic graph via the associated line graph without forming it explicitly. We outline a novel hierarchical dynamic graph edge clustering approach that efficiently breaks massive relational datasets into small sets of edges containing events at various timescales. This is in stark contrast to traditional graph clustering algorithms that prioritize highly connected community structures. Our approach relies on constructing a sufficient subgraph of a weighted line graph and applying a hierarchical agglomerative clustering. This work draws particular inspiration from HDBSCAN. We present a parallel algorithm and show that it is able to break billion-scale dynamic graphs into small sets that correlate in topology and time. The entire clustering process for a graph with $O(10 \text{ billion})$ edges takes just a few minutes of run time on 256 nodes of a distributed compute environment. We argue how the output of the edge clustering is useful for a multitude of data visualization and powerful machine learning tasks, both involving the original massive dynamic graph data and/or the non-relational metadata. Finally, we demonstrate its use on a real-world large-scale directed dynamic graph and describe how it can be extended to dynamic hypergraphs and graphs with unstructured data living on vertices and edges.
A2XP: Towards Private Domain Generalization
Authors: Geunhyeok Yu, Hyoseok Hwang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep Neural Networks (DNNs) have become pivotal in various fields, especially in computer vision, outperforming previous methodologies. A critical challenge in their deployment is the bias inherent in data across different domains, such as image style, and environmental conditions, leading to domain gaps. This necessitates techniques for learning general representations from biased training data, known as domain generalization. This paper presents Attend to eXpert Prompts (A2XP), a novel approach for domain generalization that preserves the privacy and integrity of the network architecture. A2XP consists of two phases: Expert Adaptation and Domain Generalization. In the first phase, prompts for each source domain are optimized to guide the model towards the optimal direction. In the second phase, two embedder networks are trained to effectively amalgamate these expert prompts, aiming for an optimal output. Our extensive experiments demonstrate that A2XP achieves state-of-the-art results over existing non-private domain generalization methods. The experimental results validate that the proposed approach not only tackles the domain generalization challenge in DNNs but also offers a privacy-preserving, efficient solution to the broader field of computer vision.
From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-Fidelity Faster-than-Real-Time Simulator
Authors: Timothy R. Player, Arjo Chakravarty, Mabel M. Zhang, Ben Yair Raanan, Brian Kieft, Yanwu Zhang, Brett Hobson
Abstract
We designed and validated a novel simulator for efficient development of multi-robot marine missions. To accelerate development of cooperative behaviors, the simulator models the robots' operating conditions with moderately high fidelity and runs significantly faster than real time, including acoustic communications, dynamic environmental data, and high-resolution bathymetry in large worlds. The simulator's ability to exceed a real-time factor (RTF) of 100 has been stress-tested with a robust continuous integration suite and was used to develop a multi-robot field experiment.
Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV
Authors: Chi Zhang, Paul Scheffler, Thomas Benz, Matteo Perotti, Luca Benini
Abstract
Sparse matrix vector multiplication (SpMV) is central to numerous data-intensive applications, but requires streaming indirect memory accesses that severely degrade both processing and memory throughput in state-of-the-art architectures. Near-memory hardware units, decoupling indirect streams from processing elements, partially alleviate the bottleneck, but rely on low DRAM access granularity, which is highly inefficient for modern DRAM standards like HBM and LPDDR. To fully address the end-to-end challenge, we propose a low-overhead data coalescer combined with a near-memory indirect streaming unit for AXI-Pack, an extension to the widespread AXI4 protocol packing narrow irregular stream elements onto wide memory buses. Our combined solution leverages the memory-level parallelism and coalescence of streaming indirect accesses in irregular applications like SpMV to maximize the performance and bandwidth efficiency attained on wide memory interfaces. Our solution delivers an average speedup of 8x in effective indirect access, often reaching the full memory bandwidth. As a result, we achieve an average end-to-end speedup on SpMV of 3x. Moreover, our approach demonstrates remarkable on-chip efficiency, requiring merely 27kB of on-chip storage and a very compact implementation area of 0.2-0.3mm^2 in a 12nm node.
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking
Abstract
Multi-Object Tracking (MOT) remains a vital component of intelligent video analysis, which aims to locate targets and maintain a consistent identity for each target throughout a video sequence. Existing works usually learn a discriminative feature representation, such as motion and appearance, to associate the detections across frames, which are easily affected by mutual occlusion and background clutter in practice. In this paper, we propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets, so as to achieve robust data association in the tracking process. For the detections without being associated, we design a novel single-shot feature learning module to extract discriminative features of each detection, which can efficiently associate targets between adjacent frames. For the tracklets being lost several frames, we design a novel multi-shot feature learning module to extract discriminative features of each tracklet, which can accurately refind these lost targets after a long period. Once equipped with a simple data association logic, the resulting VisualTracker can perform robust MOT based on the single-shot and multi-shot feature representations. Extensive experimental results demonstrate that our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
Optimized Deep Learning Models for AUV Seabed Image Analysis
Authors: Rajesh Sharma R, Akey Sungheetha, Chinnaiyan R
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Using autonomous underwater vehicles, or AUVs, has completely changed how we gather data from the ocean floor. AUV innovation has advanced significantly, especially in the analysis of images, due to the increasing need for accurate and efficient seafloor mapping. This blog post provides a detailed summary and comparison of the most current advancements in AUV seafloor image processing. We will go into the realm of undersea technology, covering everything through computer and algorithmic advancements to advances in sensors and cameras. After reading this page through to the end, you will have a solid understanding of the most up-to-date techniques and tools for using AUVs to process seabed photos and how they could further our comprehension of the ocean floor
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
Abstract
Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.
Simultaneous Synthesis and Verification of Neural Control Barrier Functions through Branch-and-Bound Verification-in-the-loop Training
Authors: Xinyu Wang, Luzia Knoedler, Frederik Baymler Mathiesen, Javier Alonso-Mora
Abstract
Control Barrier Functions (CBFs) that provide formal safety guarantees have been widely used for safety-critical systems. However, it is non-trivial to design a CBF. Utilizing neural networks as CBFs has shown great success, but it necessitates their certification as CBFs. In this work, we leverage bound propagation techniques and the Branch-and-Bound scheme to efficiently verify that a neural network satisfies the conditions to be a CBF over the continuous state space. To accelerate training, we further present a framework that embeds the verification scheme into the training loop to synthesize and verify a neural CBF simultaneously. In particular, we employ the verification scheme to identify partitions of the state space that are not guaranteed to satisfy the CBF conditions and expand the training dataset by incorporating additional data from these partitions. The neural network is then optimized using the augmented dataset to meet the CBF conditions. We show that for a non-linear control-affine system, our framework can efficiently certify a neural network as a CBF and render a larger safe set than state-of-the-art neural CBF works. We further employ our learned neural CBF to derive a safe controller to illustrate the practical use of our framework.
DeepClean: Machine Unlearning on the Cheap by Resetting Privacy Sensitive Weights using the Fisher Diagonal
Authors: Jiaeli Shi, Najah Ghalyan, Kostis Gourgoulias, John Buford, Sean Moran
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Machine learning models trained on sensitive or private data can inadvertently memorize and leak that information. Machine unlearning seeks to retroactively remove such details from model weights to protect privacy. We contribute a lightweight unlearning algorithm that leverages the Fisher Information Matrix (FIM) for selective forgetting. Prior work in this area requires full retraining or large matrix inversions, which are computationally expensive. Our key insight is that the diagonal elements of the FIM, which measure the sensitivity of log-likelihood to changes in weights, contain sufficient information for effective forgetting. Specifically, we compute the FIM diagonal over two subsets -- the data to retain and forget -- for all trainable weights. This diagonal representation approximates the complete FIM while dramatically reducing computation. We then use it to selectively update weights to maximize forgetting of the sensitive subset while minimizing impact on the retained subset. Experiments show that our algorithm can successfully forget any randomly selected subsets of training data across neural network architectures. By leveraging the FIM diagonal, our approach provides an interpretable, lightweight, and efficient solution for machine unlearning with practical privacy benefits.
Accurate and Fast Fischer-Tropsch Reaction Microkinetics using PINNs
Abstract
Microkinetics allows detailed modelling of chemical transformations occurring in many industrially relevant reactions. Traditional way of solving the microkinetics model for Fischer-Tropsch synthesis (FTS) becomes inefficient when it comes to more advanced real-time applications. In this work, we address these challenges by using physics-informed neural networks(PINNs) for modelling FTS microkinetics. We propose a computationally efficient and accurate method, enabling the ultra-fast solution of the existing microkinetics models in realistic process conditions. The proposed PINN model computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with median relative error (MRE) of 0.03%, and the FTS product formation rates with MRE of 0.1%. Compared to conventional equation solvers, the model achieves up to 1E+06 times speed-up when running on GPUs, thus being fast enough for multi-scale and multi-physics reactor modelling and enabling its applications in real-time process control and optimization.
Memory Management Strategies for an Internet of Things System
Authors: Ana-Maria Comeagă, Iuliana Marin
Subjects: Software Engineering (cs.SE); Operating Systems (cs.OS)
Abstract
The rise of the Internet has brought about significant changes in our lives, and the rapid expansion of the Internet of Things (IoT) is poised to have an even more substantial impact by connecting a wide range of devices across various application domains. IoT devices, especially low-end ones, are constrained by limited memory and processing capabilities, necessitating efficient memory management within IoT operating systems. This paper delves into the importance of memory management in IoT systems, with a primary focus on the design and configuration of such systems, as well as the scalability and performance of scene management. Effective memory management is critical for optimizing resource usage, responsiveness, and adaptability as the IoT ecosystem continues to grow. The study offers insights into memory allocation, scene execution, memory reduction, and system scalability within the context of an IoT system, ultimately highlighting the vital role that memory management plays in facilitating a seamless and efficient IoT experience.
Towards General Loop Invariant Generation via Coordinating Symbolic Execution and Large Language Models
Abstract
Loop invariants, essential for program verification, are challenging to auto-generate especially for programs incorporating complex memory manipulations. Existing approaches for generating loop invariants rely on fixed sets or templates, hampering adaptability to real-world programs. Recent efforts have explored machine learning for loop invariant generation, but the lack of labeled data and the need for efficient generation are still troublesome. We consider the advent of the large language model (LLM) presents a promising solution, which can analyze the separation logic assertions after symbolic execution to infer loop invariants. To overcome the data scarcity issue, we propose a self-supervised learning paradigm to fine-tune LLM, using the split-and-reassembly of predicates to create an auxiliary task and generate rich synthetic data for offline training. Meanwhile, the proposed interactive system between LLM and traditional verification tools provides an efficient online querying process for unseen programs. Our framework can readily extend to new data structures or multi-loop programs since our framework only needs the definitions of different separation logic predicates, aiming to bridge the gap between existing capabilities and requirements of loop invariant generation in practical scenarios. Experiments across diverse memory-manipulated programs have demonstrated the performance of our proposed method compared to the baselines with respect to efficiency and effectiveness.
ReuseSense: With Great Reuse Comes Greater Efficiency; Effectively Employing Computation Reuse on General-Purpose CPUs
Authors: Nitesh Narayana GS, Marc Ordoñez, Lokananda Hari, Franyell Silfa, Antonio González
Abstract
Deep Neural Networks (DNNs) are the de facto algorithm for tackling cognitive tasks in real-world applications such as speech recognition and natural language processing. DNN inference comprises numerous dot product operations between inputs and weights that require numerous multiplications and memory accesses, which hinder their performance and energy consumption when evaluated in modern CPUs. In this work, we leverage the high degree of similarity between consecutive inputs in different DNN layers to improve the performance and energy efficiency of DNN inference on CPUs. To this end, we propose ReuseSense, a new hardware scheme that includes ReuseSensor, an engine to efficiently generate the compute and load instructions needed to evaluate a DNN layer accordingly when sensing similar inputs. By intelligently reusing previously computed product values, ReuseSense allows bypassing computations when encountering input values identical to previous ones. Additionally, it efficiently avoids redundant loads by skipping weight loads associated with the bypassed dot product computations. Our experiments show that ReuseSense achieves an 8x speedup in performance and a 74% reduction in total energy consumption across several DNNs on average over the baseline.
Fast Estimations of Hitting Time of Elitist Evolutionary Algorithms from Fitness Levels
Authors: Jun He, Siang Yew Chong, Xin Yao
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
The fitness level method is an easy-to-use tool for estimating the hitting time of elitist EAs. Recently, general linear lower and upper bounds from fitness levels have been constructed. However, the construction of these bounds requires recursive computation, which makes them difficult to use in practice. We address this shortcoming with a new directed graph (digraph) method that does not require recursive computation and significantly simplifies the calculation of coefficients in linear bounds. In this method, an EA is modeled as a Markov chain on a digraph. Lower and upper bounds are directly calculated using conditional transition probabilities on the digraph. This digraph method provides straightforward and explicit expressions of lower and upper time bound for elitist EAs. In particular, it can be used to derive tight lower bound on both fitness landscapes without and with shortcuts. This is demonstrated through four examples: the (1+1) EA on OneMax, FullyDeceptive, TwoMax1 and Deceptive. Our work extends the fitness level method from addressing simple fitness functions without shortcuts to more realistic functions with shortcuts.
Segment Anything Model with Uncertainty Rectification for Auto-Prompting Medical Image Segmentation
Abstract
The introduction of the Segment Anything Model (SAM) has marked a significant advancement in prompt-driven image segmentation. However, SAM's application to medical image segmentation requires manual prompting of target structures to obtain acceptable performance, which is still labor-intensive. Despite attempts of auto-prompting to turn SAM into a fully automatic manner, it still exhibits subpar performance and lacks of reliability in the field of medical imaging. In this paper, we propose UR-SAM, an uncertainty rectified SAM framework to enhance the robustness and reliability for auto-prompting medical image segmentation. Our method incorporates a prompt augmentation module to estimate the distribution of predictions and generate uncertainty maps, and an uncertainty-based rectification module to further enhance the performance of SAM. Extensive experiments on two public 3D medical datasets covering the segmentation of 35 organs demonstrate that without supplementary training or fine-tuning, our method further improves the segmentation performance with up to 10.7 % and 13.8 % in dice similarity coefficient, demonstrating efficiency and broad capabilities for medical image segmentation without manual prompting.
Efficient Profit Maximization in Reliability Concerned Static Vehicular Cloud System
Abstract
Modern electric VUs are equipped with a variety of increasingly potent computing, communication, and storage resources, and with this tremendous computation power in their arsenal can be used to enhance the computing power of regular cloud systems, which is termed as vehicular cloud. Unlike in the traditional cloud computing resources, these vehicular cloud resource moves around and participates in the vehicular cloud for a sporadic duration at parking places, shopping malls, etc. This introduces the dynamic nature of vehicular resource participation in the vehicular cloud. As the user-submitted task gets allocated on these vehicular units for execution and the dynamic stay nature of vehicular units, enforce the system to ensure the reliability of task execution by allocating multiple redundant vehicular units for the task. In this work, we are maximizing the profit of vehicular cloud by ensuring the reliability of task execution where user tasks come online manner with different revenue, execution, and deadline. We propose an efficient approach to solve this problem by considering (a) task classification based on the deadline and laxity of the task, (b) ordering of tasks for task admission based on the expected profit of the task, (c) classification of vehicular units based in expected residency time and reliability concerned redundant allocation of tasks of vehicular units considering this classification and (d) handing dynamic scenario of the vehicular unit leaving the cloud system by copying the maximum percentage of executed virtual machine of the task to the substitute unit. We compared our proposed profit maximization approach with the state of art approach and showed that our approach outperforms the state of art approach with an extra 10\% to 20\% profit margin.
LUNA-CIM: Lookup Table based Programmable Neural Processing in Memory
Abstract
This paper presents a novel approach for performing computations using Look-Up Tables (LUTs) tailored specifically for Compute-in-Memory applications. The aim is to address the scalability challenges associated with LUT-based computation by reducing storage requirements and energy consumption while capitalizing on the faster and more energy-efficient nature of look-up methods compared to conventional mathematical computations. The proposed method leverages a divide and conquer (D&C) strategy to enhance the scalability of LUT-based computation. By breaking down high-precision multiplications into lower-precision operations, the technique achieves significantly lower area overheads, up to approximately 3.7 times less than conventional LUT-based approaches, without compromising accuracy. To validate the effectiveness of the proposed method, extensive simulations using TSMC 65 nm technology were conducted. The experimental analysis reveals that the proposed approach accounts for less than 0.1\% of the total energy consumption, with only a 32\% increase in area overhead. These results demonstrate considerable improvements achieved in energy efficiency and area utilization through the novel low-energy, low-area-overhead LUT-based computation in an SRAM array.
Countering Misinformation via Emotional Response Generation
Authors: Daniel Russo, Shane Peter Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini
Abstract
The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-checkers are crucial to debunking viral claims, they usually do not engage in conversations on social media. Thereby, significant effort has been made to automate the use of fact-checker material in social correction; however, no previous work has tried to integrate it with the style and pragmatics that are commonly employed in social media communication. To fill this gap, we present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs (linked to debunking articles), accounting for both SMP-style and basic emotions, two factors which have a significant role in misinformation credibility and spreading. To collect this dataset we used a technique based on an author-reviewer pipeline, which efficiently combines LLMs and human annotators to obtain high-quality data. We also provide comprehensive experiments showing how models trained on our proposed dataset have significant improvements in terms of output quality and generalization capabilities.
$Pc\varepsilonκ{max}$-Means++: Adapt-$P$ Driven by Energy and Distance Quality Probabilities Based on $κ$-Means++ for the Stable Election Protocol (SEP)
Authors: Husam Suleiman
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The adaptive probability $P{\text{\tiny{adp}}}$ formalized in Adapt-$P$ is developed based on the remaining number of SNs $\zeta$ and optimal clustering $\kappa{\text{\tiny{max}}}$, yet $P{\text{\tiny{adp}}}$ does not implement the probabilistic ratios of energy and distance factors in the network. Furthermore, Adapt-$P$ does not localize cluster-heads in the first round properly because of its reliance on distance computations defined in LEACH, that might result in uneven distribution of cluster-heads in the WSN area and hence might at some rounds yield inefficient consumption of energy. This paper utilizes \nolinebreak{$k$\small{-}means\small{++}} and Adapt-$P$ to propose \nolinebreak{$P{\text{c}} \kappa{\text{\tiny{max}}}$\small{-}means\small{++}} clustering algorithm that better manages the distribution of cluster-heads and produces an enhanced performance. The algorithm employs an optimized cluster-head election probability $P\text{c}$ developed based on energy-based $P{\eta(j,i)}$ and distance-based $P!!!{\psi(j,i)}$ quality probabilities along with the adaptive probability $P{\text{\tiny{adp}}}$, utilizing the energy $\varepsilon$ and distance optimality $d!{\text{\tiny{opt}}}$ factors. Furthermore, the algorithm utilizes the optimal clustering $\kappa{\text{\tiny{max}}}$ derived in Adapt-$P$ to perform adaptive clustering through \nolinebreak{$\kappa{\text{\tiny{max}}}$\small{-}means\small{++}}. The proposed \nolinebreak{$P{\text{c}} \kappa{\text{\tiny{max}}}${\small{-}}means{\small{++}}} is compared with the energy-based algorithm \nolinebreak{$P\eta \varepsilon \kappa{\text{\tiny{max}}}${\small{-}}means{\small{++}}} and distance-based \nolinebreak{$P\psi d{\text{\tiny{opt}}} \kappa_{\text{\tiny{max}}}${\small{-}}means{\small{++}}} algorithm, and has shown an optimized performance in term of residual energy and stability period of the network.
Hashing it Out: Predicting Unhealthy Conversations on Twitter
Authors: Steven Leung, Filippos Papapolyzos
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Personal attacks in the context of social media conversations often lead to fast-paced derailment, leading to even more harmful exchanges being made. State-of-the-art systems for the detection of such conversational derailment often make use of deep learning approaches for prediction purposes. In this paper, we show that an Attention-based BERT architecture, pre-trained on a large Twitter corpus and fine-tuned on our task, is efficient and effective in making such predictions. This model shows clear advantages in performance to the existing LSTM model we use as a baseline. Additionally, we show that this impressive performance can be attained through fine-tuning on a relatively small, novel dataset, particularly after mitigating overfitting issues through synthetic oversampling techniques. By introducing the first transformer based model for forecasting conversational events on Twitter, this work lays the foundation for a practical tool to encourage better interactions on one of the most ubiquitous social media platforms.
Sparsity-Parameterised Dynamic Edge Colouring
Authors: Aleksander B.J. Christiansen, Eva Rotenberg, Juliette Vlieghe
Abstract
We study the edge-colouring problem, and give efficient algorithms where the number of colours is parameterised by the graph's arboricity, $\alpha$. In a dynamic graph, subject to insertions and deletions, we give a deterministic algorithm that updates a proper $\Delta + O(\alpha)$ edge~colouring in $\operatorname{poly}(\log n)$ amortised time. Our algorithm is fully adaptive to the current value of the maximum degree and arboricity. In this fully-dynamic setting, the state-of-the-art edge-colouring algorithms are either a randomised algorithm using $(1 + \varepsilon)\Delta$ colours in $\operatorname{poly}(\log n, \epsilon^{-1})$ time per update, or the naive greedy algorithm which is a deterministic $2\Delta -1$ edge colouring with $\log(\Delta)$ update time. Compared to the $(1+\varepsilon)\Delta$ algorithm, our algorithm is deterministic and asymptotically faster, and when $\alpha$ is sufficiently small compared to $\Delta$, it even uses fewer colours. In particular, ours is the first $\Delta+O(1)$ edge-colouring algorithm for dynamic forests, and dynamic planar graphs, with polylogarithmic update time. Additionally, in the static setting, we show that we can find a proper edge colouring with $\max{deg(u), deg(v)} + 2\alpha$ colours in $O(m\log n)$ time. This time bound matches that of the greedy algorithm that computes a $2\Delta-1$ colouring of the graph's edges, and improves the number of colours when $\alpha$ is sufficiently small compared to $\Delta$.
Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
Authors: Shafagh Keyvanian, Michelle J. Johnson, Nadia Figueroa
Abstract
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
Abstract
Artificial intelligence and deep learning are currently reshaping numerical simulation frameworks by introducing new modeling capabilities. These frameworks are extensively investigated in the context of model correction and parameterization where they demonstrate great potential and often outperform traditional physical models. Most of these efforts in defining hybrid dynamical systems follow {offline} learning strategies in which the neural parameterization (called here sub-model) is trained to output an ideal correction. Yet, these hybrid models can face hard limitations when defining what should be a relevant sub-model response that would translate into a good forecasting performance. End-to-end learning schemes, also referred to as online learning, could address such a shortcoming by allowing the deep learning sub-models to train on historical data. However, defining end-to-end training schemes for the calibration of neural sub-models in hybrid systems requires working with an optimization problem that involves the solver of the physical equations. Online learning methodologies thus require the numerical model to be differentiable, which is not the case for most modeling systems. To overcome this difficulty and bypass the differentiability challenge of physical models, we present an efficient and practical online learning approach for hybrid systems. The method, called EGA for Euler Gradient Approximation, assumes an additive neural correction to the physical model, and an explicit Euler approximation of the gradients. We demonstrate that the EGA converges to the exact gradients in the limit of infinitely small time steps. Numerical experiments are performed on various case studies, including prototypical ocean-atmosphere dynamics. Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
Optimal Path Planning for Aerial Load Transportation in Complex Environments using PSO-Improved Artificial Potential Fields
Authors: Ali Akbar Rezaei Lori
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
In this article, we investigate the optimal path planning for aerial load transportation in complex, dynamic, and static environments using Particle Swarm Optimization (PSO). A hierarchical optimal control system is designed for a quadrotor equipped with a cable-suspended payload, employing Euler-Lagrange equations of motion. To navigate through obstacles, an improved artificial potential field combined with the PSO algorithm is used to determine the shortest path for a virtual point, acting as a leader. This leader guides the system toward the target point while avoiding collisions with both fixed and moving obstacles. The gravitational and repulsion coefficient forces using various PSO methods are fine-tuned to achieve the best trajectory and minimize time duration. The identified point serves as the desired location for quadrotor position control, based on a sliding mode strategy. Finally, we present numerical results to demonstrate the successful transportation of the payload by the system.
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Abstract
A versatile medical image segmentation model applicable to imaging data collected with diverse equipment and protocols can facilitate model deployment and maintenance. However, building such a model typically requires a large, diverse, and fully annotated dataset, which is rarely available due to the labor-intensive and costly data curation. In this study, we develop a cost-efficient method by harnessing readily available data with partially or even sparsely annotated segmentation labels. We devise strategies for model self-disambiguation, prior knowledge incorporation, and imbalance mitigation to address challenges associated with inconsistently labeled data from various sources, including label ambiguity and imbalances across modalities, datasets, and segmentation labels. Experimental results on a multi-modal dataset compiled from eight different sources for abdominal organ segmentation have demonstrated our method's effectiveness and superior performance over alternative state-of-the-art methods, highlighting its potential for optimizing the use of existing annotated data and reducing the annotation efforts for new data to further enhance model capability.
PEFT-MedAware: Large Language Model for Medical Awareness
Authors: Keivalya Pandya
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Chat models are capable of answering a wide range of questions, however, the accuracy of their responses is highly uncertain. In this research, we propose a specialized PEFT-MedAware model where we utilize parameter-efficient fine-tuning (PEFT) to enhance the Falcon-1b large language model on specialized MedQuAD data consisting of 16,407 medical QA pairs, leveraging only 0.44% of its trainable parameters to enhance computational efficiency. The paper adopts data preprocessing and PEFT to optimize model performance, complemented by a BitsAndBytesConfig for efficient transformer training. The resulting model was capable of outperforming other LLMs in medical question-answering tasks in specific domains with greater accuracy utilizing limited computational resources making it suitable for deployment in resource-constrained environments. We propose further improvements through expanded datasets, larger models, and feedback mechanisms for sustained medical relevancy. Our work highlights the efficiency gains and specialized capabilities of PEFT in medical AI, outpacing standard models in precision without extensive resource demands. The proposed model and data are released for research purposes only.
Keyword: faster
Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method
Authors: Arindam Chaudhuri
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract
With constant growth of civilization and modernization of cities all across the world since past few centuries smart traffic management of vehicles is one of the most sorted after problem by research community. It is a challenging problem in computer vision and artificial intelligence domain. Smart traffic management basically involves segmentation of vehicles, estimation of traffic density and tracking of vehicles. The vehicle segmentation from traffic videos helps realization of niche applications such as monitoring of speed and estimation of traffic. When occlusions, background with clutters and traffic with density variations are present, this problem becomes more intractable in nature. Keeping this motivation in this research work, we investigate Faster R-CNN based deep learning method towards segmentation of vehicles. This problem is addressed in four steps viz minimization with adaptive background model, Faster R-CNN based subnet operation, Faster R-CNN initial refinement and result optimization with extended topological active nets. The computational framework uses ideas of adaptive background modeling. It also addresses shadow and illumination related issues. Higher segmentation accuracy is achieved through topological active net deformable models. The topological and extended topological active nets help to achieve stated deformations. Mesh deformation is achieved with minimization of energy. The segmentation accuracy is improved with modified version of extended topological active net. The experimental results demonstrate superiority of this computational framework
From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-Fidelity Faster-than-Real-Time Simulator
Authors: Timothy R. Player, Arjo Chakravarty, Mabel M. Zhang, Ben Yair Raanan, Brian Kieft, Yanwu Zhang, Brett Hobson
Abstract
We designed and validated a novel simulator for efficient development of multi-robot marine missions. To accelerate development of cooperative behaviors, the simulator models the robots' operating conditions with moderately high fidelity and runs significantly faster than real time, including acoustic communications, dynamic environmental data, and high-resolution bathymetry in large worlds. The simulator's ability to exceed a real-time factor (RTF) of 100 has been stress-tested with a robust continuous integration suite and was used to develop a multi-robot field experiment.
Parsing Millions of URLs per Second
Authors: Yagiz Nizipli, Daniel Lemire
Subjects: Programming Languages (cs.PL); Data Structures and Algorithms (cs.DS)
Abstract
URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e.g., Servo's rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.
LUNA-CIM: Lookup Table based Programmable Neural Processing in Memory
Abstract
This paper presents a novel approach for performing computations using Look-Up Tables (LUTs) tailored specifically for Compute-in-Memory applications. The aim is to address the scalability challenges associated with LUT-based computation by reducing storage requirements and energy consumption while capitalizing on the faster and more energy-efficient nature of look-up methods compared to conventional mathematical computations. The proposed method leverages a divide and conquer (D&C) strategy to enhance the scalability of LUT-based computation. By breaking down high-precision multiplications into lower-precision operations, the technique achieves significantly lower area overheads, up to approximately 3.7 times less than conventional LUT-based approaches, without compromising accuracy. To validate the effectiveness of the proposed method, extensive simulations using TSMC 65 nm technology were conducted. The experimental analysis reveals that the proposed approach accounts for less than 0.1\% of the total energy consumption, with only a 32\% increase in area overhead. These results demonstrate considerable improvements achieved in energy efficiency and area utilization through the novel low-energy, low-area-overhead LUT-based computation in an SRAM array.
Sparsity-Parameterised Dynamic Edge Colouring
Authors: Aleksander B.J. Christiansen, Eva Rotenberg, Juliette Vlieghe
Abstract
We study the edge-colouring problem, and give efficient algorithms where the number of colours is parameterised by the graph's arboricity, $\alpha$. In a dynamic graph, subject to insertions and deletions, we give a deterministic algorithm that updates a proper $\Delta + O(\alpha)$ edge~colouring in $\operatorname{poly}(\log n)$ amortised time. Our algorithm is fully adaptive to the current value of the maximum degree and arboricity. In this fully-dynamic setting, the state-of-the-art edge-colouring algorithms are either a randomised algorithm using $(1 + \varepsilon)\Delta$ colours in $\operatorname{poly}(\log n, \epsilon^{-1})$ time per update, or the naive greedy algorithm which is a deterministic $2\Delta -1$ edge colouring with $\log(\Delta)$ update time. Compared to the $(1+\varepsilon)\Delta$ algorithm, our algorithm is deterministic and asymptotically faster, and when $\alpha$ is sufficiently small compared to $\Delta$, it even uses fewer colours. In particular, ours is the first $\Delta+O(1)$ edge-colouring algorithm for dynamic forests, and dynamic planar graphs, with polylogarithmic update time. Additionally, in the static setting, we show that we can find a proper edge colouring with $\max{deg(u), deg(v)} + 2\alpha$ colours in $O(m\log n)$ time. This time bound matches that of the greedy algorithm that computes a $2\Delta-1$ colouring of the graph's edges, and improves the number of colours when $\alpha$ is sufficiently small compared to $\Delta$.
Optimal Embedding Dimension for Sparse Subspace Embeddings
Authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Abstract
A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall{x\in W}\ (1+\epsilon)^{-1}|x|\leq|Sx|\leq (1+\epsilon)|x|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O\theta(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.
Keyword: mobile
Hypergraph-based Multi-robot Motion Planning with Topological Guidance
Authors: Courtney McBeth, James Motes, Marco Morales, Nancy M. Amato
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Abstract
We present a multi-robot motion planning algorithm that efficiently finds paths for robot teams up to ten times larger than existing methods in congested settings with narrow passages in the environment. Narrow passages represent a source of difficulty for sampling-based motion planning algorithms. This problem is exacerbated for multi-robot systems where the planner must also avoid inter-robot collisions within these congested spaces, requiring coordination. Topological guidance, which leverages information about the robot's environment, has been shown to improve performance for mobile robot motion planning in single robot scenarios with narrow passages. Additionally, our prior work has explored using topological guidance in multi-robot settings where a high degree of coordination is required of the full robot group. This high level of coordination, however, is not always necessary and results in excessive computational overhead for large groups. Here, we propose a novel multi-robot motion planning method that leverages topological guidance to inform the planner when coordination between robots is necessary, leading to a significant improvement in scalability.
Social Isolation and Serious Mental Illness: The Role of Context-Aware Mobile Interventions
Authors: Subigya Nepal, Arvind Pillai, Emma M. Parrish, Jason Holden, Colin Depp, Andrew T. Campbell, Eric Granholm
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Abstract
Social isolation is a common problem faced by individuals with serious mental illness (SMI), and current intervention approaches have limited effectiveness. This paper presents a blended intervention approach, called mobile Social Interaction Therapy by Exposure (mSITE), to address social isolation in individuals with serious mental illness. The approach combines brief in-person cognitive-behavioral therapy (CBT) with context-triggered mobile CBT interventions that are personalized using mobile sensing data. Our approach targets social behavior and is the first context-aware intervention for improving social outcomes in serious mental illness.
A Large-Scale Study on the Prevalence and Usage of TEE-based Features on Android
Abstract
In the realm of mobile security, where OS-based protections have proven insufficient against robust attackers, Trusted Execution Environments (TEEs) have emerged as a hardware-based security technology. Despite the industry's persistence in advancing TEE technology, the impact on end users and developers remains largely unexplored. This study addresses this gap by conducting a large-scale analysis of TEE utilization in Android applications, focusing on the key areas of cryptography, digital rights management, biometric authentication, and secure dialogs. To facilitate our extensive analysis, we introduce Mobsec Analytika, a framework tailored for large-scale app examinations, which we make available to the research community. Through the analysis of 170,550 popular Android apps, our analysis illuminates the implementation of TEE-related features and their contextual usage. Our findings reveal that TEE features are predominantly utilized indirectly through third-party libraries, with only 6.7% of apps directly invoking the APIs. Moreover, the study reveals the underutilization of the recent TEE-based UI feature Protected Confirmation.
Keyword: pruning
Hierarchical Pruning of Deep Ensembles with Focal Diversity
Authors: Yanzhao Wu, Ka-Ho Chow, Wenqi Wei, Ling Liu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural network ensembles combine the wisdom of multiple deep neural networks to improve the generalizability and robustness over individual networks. It has gained increasing popularity to study deep ensemble techniques in the deep learning community. Some mission-critical applications utilize a large number of deep neural networks to form deep ensembles to achieve desired accuracy and resilience, which introduces high time and space costs for ensemble execution. However, it still remains a critical challenge whether a small subset of the entire deep ensemble can achieve the same or better generalizability and how to effectively identify these small deep ensembles for improving the space and time efficiency of ensemble execution. This paper presents a novel deep ensemble pruning approach, which can efficiently identify smaller deep ensembles and provide higher ensemble accuracy than the entire deep ensemble of a large number of member networks. Our hierarchical ensemble pruning approach (HQ) leverages three novel ensemble pruning techniques. First, we show that the focal diversity metrics can accurately capture the complementary capacity of the member networks of an ensemble, which can guide ensemble pruning. Second, we design a focal diversity based hierarchical pruning approach, which will iteratively find high quality deep ensembles with low cost and high accuracy. Third, we develop a focal diversity consensus method to integrate multiple focal diversity metrics to refine ensemble pruning results, where smaller deep ensembles can be effectively identified to offer high accuracy, high robustness and high efficiency. Evaluated using popular benchmark datasets, we demonstrate that the proposed hierarchical ensemble pruning approach can effectively identify high quality deep ensembles with better generalizability while being more time and space efficient in ensemble decision making.
Using Cooperative Game Theory to Prune Neural Networks
Authors: Mauricio Diaz-Ortiz Jr, Benjamin Kempinski, Daphne Cornelisse, Yoram Bachrach, Tal Kachman
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Abstract
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks. The ever-growing size of deep neural networks (DNNs) increases their performance, but also their computational requirements. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy. GTAP is based on eliminating neurons in the network based on an estimation of their joint impact on the prediction quality through game theoretic solutions. Specifically, we use a power index akin to the Shapley value or Banzhaf index, tailored using a procedure similar to Dropout (commonly used to tackle overfitting problems in machine learning). Empirical evaluation of both feedforward networks and convolutional neural networks shows that this method outperforms existing approaches in the achieved tradeoff between the number of parameters and model accuracy.
Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks
Authors: Rémi Ouazan Reboul, Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks (DNNs) have become ubiquitous in addressing a number of problems, particularly in computer vision. However, DNN inference is computationally intensive, which can be prohibitive e.g. when considering edge devices. To solve this problem, a popular solution is DNN pruning, and more so structured pruning, where coherent computational blocks (e.g. channels for convolutional networks) are removed: as an exhaustive search of the space of pruned sub-models is intractable in practice, channels are typically removed iteratively based on an importance estimation heuristic. Recently, promising latency-aware pruning methods were proposed, where channels are removed until the network reaches a target budget of wall-clock latency pre-emptively estimated on specific hardware. In this paper, we present Archtree, a novel method for latency-driven structured pruning of DNNs. Archtree explores multiple candidate pruned sub-models in parallel in a tree-like fashion, allowing for a better exploration of the search space. Furthermore, it involves on-the-fly latency estimation on the target hardware, accounting for closer latencies as compared to the specified budget. Empirical results on several DNN architectures and target hardware show that Archtree better preserves the original model accuracy while better fitting the latency budget as compared to existing state-of-the-art methods.
Keyword: diffusion
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Authors: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generative models for 3D object synthesis have seen significant advancements with the incorporation of prior knowledge distilled from 2D diffusion models. Nevertheless, challenges persist in the form of multi-view geometric inconsistencies and slow generation speeds within the existing 3D synthesis frameworks. This can be attributed to two factors: firstly, the deficiency of abundant geometric a priori knowledge in optimization, and secondly, the entanglement issue between geometry and texture in conventional 3D generation methods.In response, we introduce MetaDreammer, a two-stage optimization approach that leverages rich 2D and 3D prior knowledge. In the first stage, our emphasis is on optimizing the geometric representation to ensure multi-view consistency and accuracy of 3D objects. In the second stage, we concentrate on fine-tuning the geometry and optimizing the texture, thereby achieving a more refined 3D object. Through leveraging 2D and 3D prior knowledge in two stages, respectively, we effectively mitigate the interdependence between geometry and texture. MetaDreamer establishes clear optimization objectives for each stage, resulting in significant time savings in the 3D generation process. Ultimately, MetaDreamer can generate high-quality 3D objects based on textual prompts within 20 minutes, and to the best of our knowledge, it is the most efficient text-to-3D generation method. Furthermore, we introduce image control into the process, enhancing the controllability of 3D generation. Extensive empirical evidence confirms that our method is not only highly efficient but also achieves a quality level that is at the forefront of current state-of-the-art 3D generation techniques.
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems
Abstract
Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence maximization. These applications heavily rely on BPTs to implement a Monte-Carlo sampling step for their approximations. Given the large sampling complexity, stochasticity of the diffusion process, and the inherent irregularity in real-world graph topologies, efficiently parallelizing these BPTs remains significantly challenging. In this paper, we present a new algorithm to fuse massive number of concurrently executing BPTs with random starts on the input graph. Our algorithm is designed to fuse BPTs by combining separate traversals into a unified frontier on distributed multi-GPU systems. To show the general applicability of the fused BPT technique, we have incorporated it into two state-of-the-art influence maximization parallel implementations (gIM and Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer ($32,768$ GPUs and $196$K CPU cores) show strong scaling behavior, and that fused BPTs can improve the performance of these implementations up to 34$\times$ (for gIM) and ~360$\times$ (for Ripples).
Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers
Authors: Staphord Bengesi, Hoda El-Sayed, Md Kamruzzaman Sarker, Yao Houkpati, John Irungu, Timothy Oladunni
Abstract
The launch of ChatGPT has garnered global attention, marking a significant milestone in the field of Generative Artificial Intelligence. While Generative AI has been in effect for the past decade, the introduction of ChatGPT has ignited a new wave of research and innovation in the AI domain. This surge in interest has led to the development and release of numerous cutting-edge tools, such as Bard, Stable Diffusion, DALL-E, Make-A-Video, Runway ML, and Jukebox, among others. These tools exhibit remarkable capabilities, encompassing tasks ranging from text generation and music composition, image creation, video production, code generation, and even scientific work. They are built upon various state-of-the-art models, including Stable Diffusion, transformer models like GPT-3 (recent GPT-4), variational autoencoders, and generative adversarial networks. This advancement in Generative AI presents a wealth of exciting opportunities and, simultaneously, unprecedented challenges. Throughout this paper, we have explored these state-of-the-art models, the diverse array of tasks they can accomplish, the challenges they pose, and the promising future of Generative Artificial Intelligence.
Multiscale Hodge Scattering Networks for Data Analysis
Authors: Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Abstract
We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \N$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kk$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
Abstract
Current subject-driven image generation methods encounter significant challenges in person-centric image generation. The reason is that they learn the semantic scene and person generation by fine-tuning a common pre-trained diffusion, which involves an irreconcilable training imbalance. Precisely, to generate realistic persons, they need to sufficiently tune the pre-trained model, which inevitably causes the model to forget the rich semantic scene prior and makes scene generation over-fit to the training data. Moreover, even with sufficient fine-tuning, these methods can still not generate high-fidelity persons since joint learning of the scene and person generation also lead to quality compromise. In this paper, we propose Face-diffuser, an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise. Specifically, we first develop two specialized pre-trained diffusion models, i.e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively. The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement. The first and last stages are performed by TDM and SDM respectively. The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion (SNF). Specifically, it is based on our key observation that there exists a robust link between classifier-free guidance responses and the saliency of generated images. In each time step, SNF leverages the unique strengths of each model and allows for the spatial blending of predicted noises from both models automatically in a saliency-aware manner. Extensive experiments confirm the impressive effectiveness and robustness of the Face-diffuser.
Enhancing Object Coherence in Layout-to-Image Synthesis
Authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Layout-to-image synthesis is an emerging technique in conditional image generation. It aims to generate complex scenes, where users require fine control over the layout of the objects in a scene. However, it remains challenging to control the object coherence, including semantic coherence (e.g., the cat looks at the flowers or not) and physical coherence (e.g., the hand and the racket should not be misaligned). In this paper, we propose a novel diffusion model with effective global semantic fusion (GSF) and self-similarity feature enhancement modules to guide the object coherence for this task. For semantic coherence, we argue that the image caption contains rich information for defining the semantic relationship within the objects in the images. Instead of simply employing cross-attention between captions and generated images, which addresses the highly relevant layout restriction and semantic coherence separately and thus leads to unsatisfying results shown in our experiments, we develop GSF to fuse the supervision from the layout restriction and semantic coherence requirement and exploit it to guide the image synthesis process. Moreover, to improve the physical coherence, we develop a Self-similarity Coherence Attention (SCA) module to explicitly integrate local contextual physical coherence into each pixel's generation process. Specifically, we adopt a self-similarity map to encode the coherence restrictions and employ it to extract coherent features from text embedding. Through visualization of our self-similarity map, we explore the essence of SCA, revealing that its effectiveness is not only in capturing reliable physical coherence patterns but also in enhancing complex texture generation. Extensive experiments demonstrate the superiority of our proposed method in both image generation quality and controllability.
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Authors: Sai Saketh Rambhatla, Ishan Misra
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, making the generative model directly applicable to discriminative tasks. Using SelfEval, we repurpose standard datasets created for evaluating multimodal text-image discriminative models to evaluate generative models in a fine-grained manner: assessing their performance on attribute binding, color recognition, counting, shape recognition, spatial understanding. To the best of our knowledge SelfEval is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple models and benchmarks. Moreover, SelfEval enables us to evaluate generative models on challenging tasks such as Winoground image-score where they demonstrate competitive performance to discriminative models. We also show severe drawbacks of standard automated metrics such as CLIP-score to measure text faithfulness on benchmarks such as DrawBench, and how SelfEval sidesteps these issues. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Authors: Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
Abstract
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
Keyword: adaptive
Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method
Authors: Arindam Chaudhuri
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract
With constant growth of civilization and modernization of cities all across the world since past few centuries smart traffic management of vehicles is one of the most sorted after problem by research community. It is a challenging problem in computer vision and artificial intelligence domain. Smart traffic management basically involves segmentation of vehicles, estimation of traffic density and tracking of vehicles. The vehicle segmentation from traffic videos helps realization of niche applications such as monitoring of speed and estimation of traffic. When occlusions, background with clutters and traffic with density variations are present, this problem becomes more intractable in nature. Keeping this motivation in this research work, we investigate Faster R-CNN based deep learning method towards segmentation of vehicles. This problem is addressed in four steps viz minimization with adaptive background model, Faster R-CNN based subnet operation, Faster R-CNN initial refinement and result optimization with extended topological active nets. The computational framework uses ideas of adaptive background modeling. It also addresses shadow and illumination related issues. Higher segmentation accuracy is achieved through topological active net deformable models. The topological and extended topological active nets help to achieve stated deformations. Mesh deformation is achieved with minimization of energy. The segmentation accuracy is improved with modified version of extended topological active net. The experimental results demonstrate superiority of this computational framework
You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks
Authors: Rafael Uetz, Marco Herzog, Louis Hackländer, Simon Schwarz, Martin Henze
Abstract
Cyberattacks have grown into a major risk for organizations, with common consequences being data theft, sabotage, and extortion. Since preventive measures do not suffice to repel attacks, timely detection of successful intruders is crucial to stop them from reaching their final goals. For this purpose, many organizations utilize Security Information and Event Management (SIEM) systems to centrally collect security-related events and scan them for attack indicators using expert-written detection rules. However, as we show by analyzing a set of widespread SIEM detection rules, adversaries can evade almost half of them easily, allowing them to perform common malicious actions within an enterprise network without being detected. To remedy these critical detection blind spots, we propose the idea of adaptive misuse detection, which utilizes machine learning to compare incoming events to SIEM rules on the one hand and known-benign events on the other hand to discover successful evasions. Based on this idea, we present AMIDES, an open-source proof-of-concept adaptive misuse detection system. Using four weeks of SIEM events from a large enterprise network and more than 500 hand-crafted evasions, we show that AMIDES successfully detects a majority of these evasions without any false alerts. In addition, AMIDES eases alert analysis by assessing which rules were evaded. Its computational efficiency qualifies AMIDES for real-world operation and hence enables organizations to significantly reduce detection blind spots with moderate effort.
Secure network coding with adaptive and active attack
Abstract
Ning Cai and the author jointly studied secure network codes over adaptive and active attacks, which were rarely studied until these seminal papers. This paper reviews the result for secure network code over adaptive and active attacks. We focus on two typical network models, a one-hop relay network and a unicast relay network.
Abstract
Current subject-driven image generation methods encounter significant challenges in person-centric image generation. The reason is that they learn the semantic scene and person generation by fine-tuning a common pre-trained diffusion, which involves an irreconcilable training imbalance. Precisely, to generate realistic persons, they need to sufficiently tune the pre-trained model, which inevitably causes the model to forget the rich semantic scene prior and makes scene generation over-fit to the training data. Moreover, even with sufficient fine-tuning, these methods can still not generate high-fidelity persons since joint learning of the scene and person generation also lead to quality compromise. In this paper, we propose Face-diffuser, an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise. Specifically, we first develop two specialized pre-trained diffusion models, i.e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively. The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement. The first and last stages are performed by TDM and SDM respectively. The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion (SNF). Specifically, it is based on our key observation that there exists a robust link between classifier-free guidance responses and the saliency of generated images. In each time step, SNF leverages the unique strengths of each model and allows for the spatial blending of predicted noises from both models automatically in a saliency-aware manner. Extensive experiments confirm the impressive effectiveness and robustness of the Face-diffuser.
Decentralized Energy Marketplace via NFTs and AI-based Agents
Abstract
The paper introduces an advanced Decentralized Energy Marketplace (DEM) integrating blockchain technology and artificial intelligence to manage energy exchanges among smart homes with energy storage systems. The proposed framework uses Non-Fungible Tokens (NFTs) to represent unique energy profiles in a transparent and secure trading environment. Leveraging Federated Deep Reinforcement Learning (FDRL), the system promotes collaborative and adaptive energy management strategies, maintaining user privacy. A notable innovation is the use of smart contracts, ensuring high efficiency and integrity in energy transactions. Extensive evaluations demonstrate the system's scalability and the effectiveness of the FDRL method in optimizing energy distribution. This research significantly contributes to developing sophisticated decentralized smart grid infrastructures. Our approach broadens potential blockchain and AI applications in sustainable energy systems and addresses incentive alignment and transparency challenges in traditional energy trading mechanisms. The implementation of this paper is publicly accessible at \url{https://github.com/RasoulNik/DEM}.
DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection
Abstract
Though feature-alignment based Domain Adaptive Object Detection (DAOD) have achieved remarkable progress, they ignore the source bias issue, i.e. the aligned features are more favorable towards the source domain, leading to a sub-optimal adaptation. Furthermore, the presence of domain shift between the source and target domains exacerbates the problem of inconsistent classification and localization in general detection pipelines. To overcome these challenges, we propose a novel Distillation-based Unbiased Alignment (DUA) framework for DAOD, which can distill the source features towards a more balanced position via a pre-trained teacher model during the training process, alleviating the problem of source bias effectively. In addition, we design a Target-Relevant Object Localization Network (TROLN), which can mine target-related knowledge to produce two classification-free metrics (IoU and centerness). Accordingly, we implement a Domain-aware Consistency Enhancing (DCE) strategy that utilizes these two metrics to further refine classification confidences, achieving a harmonization between classification and localization in cross-domain scenarios. Extensive experiments have been conducted to manifest the effectiveness of this method, which consistently improves the strong baseline by large margins, outperforming existing alignment-based works.
Designing and Evaluating an Adaptive Virtual Reality System using EEG Frequencies to Balance Internal and External Attention States
Authors: Francesco Chiossi, Changkun Ou, Carolina Gerhardt, Felix Putze, Sven Mayer
Abstract
Virtual reality finds various applications in productivity, entertainment, and training scenarios requiring working memory and attentional resources. Working memory relies on prioritizing relevant information and suppressing irrelevant information through internal attention, which is fundamental for successful task performance and training. Today, virtual reality systems do not account for the impact of working memory loads resulting in over or under-stimulation. In this work, we designed an adaptive system based on EEG correlates of external and internal attention to support working memory task performance. Here, participants engaged in a visual working memory N-Back task, and we adapted the visual complexity of distracting surrounding elements. Our study first demonstrated the feasibility of EEG frontal theta and parietal alpha frequency bands for dynamic visual complexity adjustments. Second, our adaptive system showed improved task performance and diminished perceived workload compared to a reverse adaptation. Our results show the effectiveness of the proposed adaptive system, allowing for the optimization of distracting elements in high-demanding conditions. Adaptive systems based on alpha and theta frequency bands allow for the regulation of attentional and executive resources to keep users engaged in a task without resulting in cognitive overload.
Mixed Reality UI Adaptations with Inaccurate and Incomplete Objectives
Authors: Christoph Albert Johns, João Marcelo Evangelista Belo
Abstract
This position paper outlines a new approach to adapting 3D user interface (UI) layouts given the complex nature of end-user preferences. Current optimization techniques, which mainly rely on weighted sum methods, can be inflexible and result in unsatisfactory adaptations. We propose using multi-objective optimization and interactive preference elicitation to provide semi-automated, flexible, and effective adaptations of 3D UIs. Our approach is demonstrated using an example of single-element 3D layout adaptation with ergonomic objectives. Future work is needed to address questions around the presentation and selection of optimal solutions, the impact on cognitive load, and the integration of preference learning. We conclude that, to make adaptive 3D UIs truly effective, we must acknowledge the limitations of our optimization objectives and techniques and emphasize the importance of user control.
$Pc\varepsilonκ{max}$-Means++: Adapt-$P$ Driven by Energy and Distance Quality Probabilities Based on $κ$-Means++ for the Stable Election Protocol (SEP)
Authors: Husam Suleiman
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The adaptive probability $P{\text{\tiny{adp}}}$ formalized in Adapt-$P$ is developed based on the remaining number of SNs $\zeta$ and optimal clustering $\kappa{\text{\tiny{max}}}$, yet $P{\text{\tiny{adp}}}$ does not implement the probabilistic ratios of energy and distance factors in the network. Furthermore, Adapt-$P$ does not localize cluster-heads in the first round properly because of its reliance on distance computations defined in LEACH, that might result in uneven distribution of cluster-heads in the WSN area and hence might at some rounds yield inefficient consumption of energy. This paper utilizes \nolinebreak{$k$\small{-}means\small{++}} and Adapt-$P$ to propose \nolinebreak{$P{\text{c}} \kappa{\text{\tiny{max}}}$\small{-}means\small{++}} clustering algorithm that better manages the distribution of cluster-heads and produces an enhanced performance. The algorithm employs an optimized cluster-head election probability $P\text{c}$ developed based on energy-based $P{\eta(j,i)}$ and distance-based $P!!!{\psi(j,i)}$ quality probabilities along with the adaptive probability $P{\text{\tiny{adp}}}$, utilizing the energy $\varepsilon$ and distance optimality $d!{\text{\tiny{opt}}}$ factors. Furthermore, the algorithm utilizes the optimal clustering $\kappa{\text{\tiny{max}}}$ derived in Adapt-$P$ to perform adaptive clustering through \nolinebreak{$\kappa{\text{\tiny{max}}}$\small{-}means\small{++}}. The proposed \nolinebreak{$P{\text{c}} \kappa{\text{\tiny{max}}}${\small{-}}means{\small{++}}} is compared with the energy-based algorithm \nolinebreak{$P\eta \varepsilon \kappa{\text{\tiny{max}}}${\small{-}}means{\small{++}}} and distance-based \nolinebreak{$P\psi d{\text{\tiny{opt}}} \kappa_{\text{\tiny{max}}}${\small{-}}means{\small{++}}} algorithm, and has shown an optimized performance in term of residual energy and stability period of the network.
TacFR-Gripper: A Reconfigurable Fin Ray-Based Compliant Robotic Gripper with Tactile Skin for In-Hand Manipulation
Abstract
This paper introduces the TacFR-Gripper, a reconfigurable Fin Ray-based soft and compliant robotic gripper equipped with tactile skin, which can be used for dexterous in-hand manipulation tasks. This gripper can adaptively grasp objects of diverse shapes and stiffness levels. An array of Force Sensitive Resistor (FSR) sensors is embedded within the robotic finger to serve as the tactile skin, enabling the robot to perceive contact information during manipulation. We provide theoretical analysis for gripper design, including kinematic analysis, workspace analysis, and finite element analysis to identify the relationship between the gripper's load and its deformation. Moreover, we implemented a Graph Neural Network (GNN)-based tactile perception approach to enable reliable grasping without accidental slip or excessive force. Three physical experiments were conducted to quantify the performance of the TacFR-Gripper. These experiments aimed to i) assess the grasp success rate across various everyday objects through different configurations, ii) verify the effectiveness of tactile skin with the GNN algorithm in grasping, iii) evaluate the gripper's in-hand manipulation capabilities for object pose control. The experimental results indicate that the TacFR-Gripper can grasp a wide range of complex-shaped objects with a high success rate and deliver dexterous in-hand manipulation. Additionally, the integration of tactile skin with the GNN algorithm enhances grasp stability by incorporating tactile feedback during manipulations. For more details of this project, please view our website: https://sites.google.com/view/tacfr-gripper/homepage.
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest
Abstract
Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.
Sparsity-Parameterised Dynamic Edge Colouring
Authors: Aleksander B.J. Christiansen, Eva Rotenberg, Juliette Vlieghe
Abstract
We study the edge-colouring problem, and give efficient algorithms where the number of colours is parameterised by the graph's arboricity, $\alpha$. In a dynamic graph, subject to insertions and deletions, we give a deterministic algorithm that updates a proper $\Delta + O(\alpha)$ edge~colouring in $\operatorname{poly}(\log n)$ amortised time. Our algorithm is fully adaptive to the current value of the maximum degree and arboricity. In this fully-dynamic setting, the state-of-the-art edge-colouring algorithms are either a randomised algorithm using $(1 + \varepsilon)\Delta$ colours in $\operatorname{poly}(\log n, \epsilon^{-1})$ time per update, or the naive greedy algorithm which is a deterministic $2\Delta -1$ edge colouring with $\log(\Delta)$ update time. Compared to the $(1+\varepsilon)\Delta$ algorithm, our algorithm is deterministic and asymptotically faster, and when $\alpha$ is sufficiently small compared to $\Delta$, it even uses fewer colours. In particular, ours is the first $\Delta+O(1)$ edge-colouring algorithm for dynamic forests, and dynamic planar graphs, with polylogarithmic update time. Additionally, in the static setting, we show that we can find a proper edge colouring with $\max{deg(u), deg(v)} + 2\alpha$ colours in $O(m\log n)$ time. This time bound matches that of the greedy algorithm that computes a $2\Delta-1$ colouring of the graph's edges, and improves the number of colours when $\alpha$ is sufficiently small compared to $\Delta$.
Keyword: quantization
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Abstract
Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication
Abstract
From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators implemented in the same technology. The hash function is a decision tree, which allows for an efficient hardware implementation as the multiply-accumulate operations are replaced by decision tree passes and LUT lookups. The entire Maddness MatMul can be broken down into parts that allow an effective implementation with small computing units and memories, allowing it to reach extreme efficiency while remaining generically applicable for MatMul tasks. In a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of more than 92.5% using ResNet9.
Keyword: efficient
Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Hypergraph-based Multi-robot Motion Planning with Topological Guidance
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication
Layer-to-Layer Melt Pool Control in Laser Power Bed Fusion
Asymptotically Fair Participation in Machine Learning Models: an Optimal Control Perspective
Segment Anything in Defect Detection
FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems
Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving
Multiscale Hodge Scattering Networks for Data Analysis
Telescope: Telemetry at Terabyte Scale
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures
s or Python
s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.Scalable Algorithms for Laplacian Pseudo-inverse Computation
Leveraging Function Space Aggregation for Federated Learning at Scale
Hierarchical Pruning of Deep Ensembles with Focal Diversity
Learning transformer-based heterogeneously salient graph representation for multimodal fusion classification of hyperspectral image and LiDAR data
Scalable Edge Clustering of Dynamic Graphs via Weighted Line Graphs
A2XP: Towards Private Domain Generalization
From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-Fidelity Faster-than-Real-Time Simulator
Near-Memory Parallel Indexing and Coalescing: Enabling Highly Efficient Indirect Access for SpMV
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking
Optimized Deep Learning Models for AUV Seabed Image Analysis
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
Simultaneous Synthesis and Verification of Neural Control Barrier Functions through Branch-and-Bound Verification-in-the-loop Training
DeepClean: Machine Unlearning on the Cheap by Resetting Privacy Sensitive Weights using the Fisher Diagonal
Accurate and Fast Fischer-Tropsch Reaction Microkinetics using PINNs
Memory Management Strategies for an Internet of Things System
Towards General Loop Invariant Generation via Coordinating Symbolic Execution and Large Language Models
ReuseSense: With Great Reuse Comes Greater Efficiency; Effectively Employing Computation Reuse on General-Purpose CPUs
Fast Estimations of Hitting Time of Elitist Evolutionary Algorithms from Fitness Levels
Segment Anything Model with Uncertainty Rectification for Auto-Prompting Medical Image Segmentation
Efficient Profit Maximization in Reliability Concerned Static Vehicular Cloud System
LUNA-CIM: Lookup Table based Programmable Neural Processing in Memory
Countering Misinformation via Emotional Response Generation
$Pc\varepsilonκ{max}$-Means++: Adapt-$P$ Driven by Energy and Distance Quality Probabilities Based on $κ$-Means++ for the Stable Election Protocol (SEP)
Hashing it Out: Predicting Unhealthy Conversations on Twitter
Sparsity-Parameterised Dynamic Edge Colouring
Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
Optimal Path Planning for Aerial Load Transportation in Complex Environments using PSO-Improved Artificial Potential Fields
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
PEFT-MedAware: Large Language Model for Medical Awareness
Keyword: faster
Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method
From Concept to Field Tests: Accelerated Development of Multi-AUV Missions Using a High-Fidelity Faster-than-Real-Time Simulator
Parsing Millions of URLs per Second
LUNA-CIM: Lookup Table based Programmable Neural Processing in Memory
Sparsity-Parameterised Dynamic Edge Colouring
Optimal Embedding Dimension for Sparse Subspace Embeddings
Keyword: mobile
Hypergraph-based Multi-robot Motion Planning with Topological Guidance
Social Isolation and Serious Mental Illness: The Role of Context-Aware Mobile Interventions
A Large-Scale Study on the Prevalence and Usage of TEE-based Features on Android
Keyword: pruning
Hierarchical Pruning of Deep Ensembles with Focal Diversity
Using Cooperative Game Theory to Prune Neural Networks
Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks
Keyword: diffusion
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems
Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers
Multiscale Hodge Scattering Networks for Data Analysis
High-fidelity Person-centric Subject-to-Image Synthesis
Enhancing Object Coherence in Layout-to-Image Synthesis
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Keyword: adaptive
Smart Traffic Management of Vehicles using Faster R-CNN based Deep Learning Method
You Cannot Escape Me: Detecting Evasions of SIEM Rules in Enterprise Networks
Secure network coding with adaptive and active attack
High-fidelity Person-centric Subject-to-Image Synthesis
Decentralized Energy Marketplace via NFTs and AI-based Agents
DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection
Designing and Evaluating an Adaptive Virtual Reality System using EEG Frequencies to Balance Internal and External Attention States
Mixed Reality UI Adaptations with Inaccurate and Incomplete Objectives
$Pc\varepsilonκ{max}$-Means++: Adapt-$P$ Driven by Energy and Distance Quality Probabilities Based on $κ$-Means++ for the Stable Election Protocol (SEP)
TacFR-Gripper: A Reconfigurable Fin Ray-Based Compliant Robotic Gripper with Tactile Skin for In-Hand Manipulation
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest
Sparsity-Parameterised Dynamic Edge Colouring
Keyword: quantization
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication