New submissions for Fri, 14 Apr 23

Keyword: efficient

RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

Authors: Yuanhang Shao, Tonmoy Dey, Nikola Vuckovic, Luke Van Popering, Alan Kuhnle
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06048
Pdf link: https://arxiv.org/pdf/2304.06048
Abstract Combinatorial optimization (CO) aims to efficiently find the best solution to NP-hard problems ranging from statistical physics to social media marketing. A wide range of CO applications can benefit from local search methods because they allow reversible action over greedy policies. Deep Q-learning (DQN) using message-passing neural networks (MPNN) has shown promise in replicating the local search behavior and obtaining comparable results to the local search algorithms. However, the over-smoothing and the information loss during the iterations of message passing limit its robustness across applications, and the large message vectors result in memory inefficiency. Our paper introduces RELS-DQN, a lightweight DQN framework that exhibits the local search behavior while providing practical scalability. Using the RELS-DQN model trained on one application, it can generalize to various applications by providing solution values higher than or equal to both the local search algorithms and the existing DQN models while remaining efficient in runtime and memory.
Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation
Authors: Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, Homayoun Najjaran
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06055
Pdf link: https://arxiv.org/pdf/2304.06055
Abstract Reinforcement learning demonstrates significant potential in automatically building control policies in numerous domains, but shows low efficiency when applied to robot manipulation tasks due to the curse of dimensionality. To facilitate the learning of such tasks, prior knowledge or heuristics that incorporate inherent simplification can effectively improve the learning performance. This paper aims to define and incorporate the natural symmetry present in physical robotic environments. Then, sample-efficient policies are trained by exploiting the expert demonstrations in symmetrical environments through an amalgamation of reinforcement and behavior cloning, which gives the off-policy learning process a diverse yet compact initiation. Furthermore, it presents a rigorous framework for a recent concept and explores its scope for robot manipulation tasks. The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle, in a simulation experiment study. A PID controller, which tracks the linear joint-space trajectories with hard-coded temporal logic to produce interim midpoints, is used to generate demonstrations in the study. The results of the study present the effect of the number of demonstrations and quantify the magnitude of behavior cloning to exemplify the possible improvement of model-free reinforcement learning in common manipulation tasks. A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06059
Pdf link: https://arxiv.org/pdf/2304.06059
Abstract Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.
Energy-guided Entropic Neural Optimal Transport
Authors: Petr Mokrov, Alexander Korotin, Evgeny Burnaev
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.06094
Pdf link: https://arxiv.org/pdf/2304.06094
Abstract Energy-Based Models (EBMs) are known in the Machine Learning community for the decades. Since the seminal works devoted to EBMs dating back to the noughties there have been appearing a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present the novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. We validate the applicability of our method on toy 2D scenarios as well as standard unpaired image-to-image translation problems. For the sake of simplicity, we choose simple short- and long- run EBMs as a backbone of our Energy-guided Entropic OT method, leaving the application of more sophisticated EBMs for future research.
IoT trust and reputation: a survey and taxonomy
Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06119
Pdf link: https://arxiv.org/pdf/2304.06119
Abstract IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.
Label-Free Concept Bottleneck Models
Authors: Tuomas Oikarinen, Subhro Das, Lam M. Nguyen, Tsui-Wei Weng
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06129
Pdf link: https://arxiv.org/pdf/2304.06129
Abstract Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial limitations: first, they need to collect labeled data for each of the predefined concepts, which is time consuming and labor intensive; second, the accuracy of a CBM is often significantly lower than that of a standard neural network, especially on more complex datasets. This poor performance creates a barrier for adopting CBMs in practical real world applications. Motivated by these challenges, we propose Label-free CBM which is a novel framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. Our Label-free CBM has many advantages, it is: scalable - we present the first CBM scaled to ImageNet, efficient - creating a CBM takes only a few hours even for very large datasets, and automated - training it for a new dataset requires minimal human effort. Our code is available at https://github.com/Trustworthy-ML-Lab/Label-free-CBM.
AGI for Agriculture
Authors: Guoyu Lu, Sheng Li, Gengchen Mai, Jin Sun, Dajiang Zhu, Lilong Chai, Haijian Sun, Xianqiao Wang, Haixing Dai, Ninghao Liu, Rui Xu, Daniel Petti, Changying Li, Tianming Liu, Changying Li
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06136
Pdf link: https://arxiv.org/pdf/2304.06136
Abstract Artificial General Intelligence (AGI) is poised to revolutionize a variety of sectors, including healthcare, finance, transportation, and education. Within healthcare, AGI is being utilized to analyze clinical medical notes, recognize patterns in patient data, and aid in patient management. Agriculture is another critical sector that impacts the lives of individuals worldwide. It serves as a foundation for providing food, fiber, and fuel, yet faces several challenges, such as climate change, soil degradation, water scarcity, and food security. AGI has the potential to tackle these issues by enhancing crop yields, reducing waste, and promoting sustainable farming practices. It can also help farmers make informed decisions by leveraging real-time data, leading to more efficient and effective farm management. This paper delves into the potential future applications of AGI in agriculture, such as agriculture image processing, natural language processing (NLP), robotics, knowledge graphs, and infrastructure, and their impact on precision livestock and precision crops. By leveraging the power of AGI, these emerging technologies can provide farmers with actionable insights, allowing for optimized decision-making and increased productivity. The transformative potential of AGI in agriculture is vast, and this paper aims to highlight its potential to revolutionize the industry.
Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction
Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.06178
Pdf link: https://arxiv.org/pdf/2304.06178
Abstract Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.
SePEnTra: A secure and privacy-preserving energy trading mechanisms in transactive energy market
Authors: Rumpa Dasgupta, Amin Sakzad, Carsten Rudolph, Rafael Dowsley
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.06179
Pdf link: https://arxiv.org/pdf/2304.06179
Abstract In this paper, we design and present a novel model called SePEnTra to ensure the security and privacy of energy data while sharing with other entities during energy trading to determine optimal price signals. Furthermore, the market operator can use this data to detect malicious activities of users in the later stage without violating privacy (e.g., deviation of actual energy generation/consumption from forecast beyond a threshold). We use two cryptographic primitives, additive secret sharing and Pedersen commitment, in SePEnTra. The performance of our model is evaluated theoretically and numerically. We compare the performance of SePEnTra with the same Transactive energy market (TEM) framework without security mechanisms. The result shows that even though using advanced cryptographic primitives in a large market framework, SePEnTra has very low computational complexity and communication overhead. Moreover, it is storage efficient for all parties.
SURFSUP: Learning Fluid Simulation for Novel Surfaces
Authors: Arjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, Richard Zemel
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2304.06197
Pdf link: https://arxiv.org/pdf/2304.06197
Abstract Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. This continuous representation of geometry enables more accurate simulation of fluid-object interactions over long time periods while simultaneously making computation more efficient. Moreover, SURFSUP trained on simple shape primitives generalizes considerably out-of-distribution, even to complex real-world scenes and objects. Finally, we show we can invert our model to design simple objects to manipulate fluid flow.
Space-Time Tradeoffs for Conjunctive Queries with Access Patterns
Authors: Hangdong Zhao, Shaleen Deep, Paraschos Koutris
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.06221
Pdf link: https://arxiv.org/pdf/2304.06221
Abstract In this paper, we investigate space-time tradeoffs for answering conjunctive queries with access patterns (CQAPs). The goal is to create a space-efficient data structure in an initial preprocessing phase and use it for answering (multiple) queries in an online phase. Previous work has developed data structures that trades off space usage for answering time for queries of practical interest, such as the path and triangle query. However, these approaches lack a comprehensive framework and are not generalizable. Our main contribution is a general algorithmic framework for obtaining space-time tradeoffs for any CQAP. Our framework builds upon the $\PANDA$ algorithm and tree decomposition techniques. We demonstrate that our framework captures all state-of-the-art tradeoffs that were independently produced for various queries. Further, we show surprising improvements over the state-of-the-art tradeoffs known in the existing literature for reachability queries.
Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function
Authors: Muhammad Febrian Rachmadi, Charissa Poon, Henrik Skibbe
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06229
Pdf link: https://arxiv.org/pdf/2304.06229
Abstract In this paper, we propose a novel two-component loss for biomedical image segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss, a loss function that addresses the instance imbalance problem commonly encountered when using pixel-wise loss functions such as the Dice loss. The Instance-wise component improves the detection of small instances or ``blobs" in image datasets with both large and small instances. The Center-of-Instance component improves the overall detection accuracy. We compared the ICI loss with two existing losses, the Dice loss and the blob loss, in the task of stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI 2022. Compared to the other losses, the ICI loss provided a better balanced segmentation, and significantly outperformed the Dice loss with an improvement of $1.7-3.7\%$ and the blob loss by $0.6-5.0\%$ in terms of the Dice similarity coefficient on both validation and test set, suggesting that the ICI loss is a potential solution to the instance imbalance problem.
Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs
Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06234
Pdf link: https://arxiv.org/pdf/2304.06234
Abstract Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.
Cross-View Hierarchy Network for Stereo Image Super-Resolution
Authors: Wenbin Zou, Hongxia Gao, Liang Chen, Yunchen Zhang, Mingchao Jiang, Zhongxin Yu, Ming Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06236
Pdf link: https://arxiv.org/pdf/2304.06236
Abstract Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views. To attain superior performance, many methods have prioritized designing complex modules to fuse similar information across views, yet overlooking the importance of intra-view information for high-resolution reconstruction. It also leads to problems of wrong texture in recovered images. To address this issue, we explore the interdependencies between various hierarchies from intra-view and propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR). Specifically, we design a cross-hierarchy information mining block (CHIMB) that leverages channel attention and large kernel convolution attention to extract both global and local features from the intra-view, enabling the efficient restoration of accurate texture details. Additionally, a cross-view interaction module (CVIM) is proposed to fuse similar features from different views by utilizing cross-view attention mechanisms, effectively adapting to the binocular scene. Extensive experiments demonstrate the effectiveness of our method. CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters. The source code and pre-trained models are available at https://github.com/AlexZou14/CVHSSR.
EWT: Efficient Wavelet-Transformer for Single Image Denoising
Authors: Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06274
Pdf link: https://arxiv.org/pdf/2304.06274
Abstract Transformer-based image denoising methods have achieved encouraging results in the past year. However, it must uses linear operations to model long-range dependencies, which greatly increases model inference time and consumes GPU storage space. Compared with convolutional neural network-based methods, current Transformer-based image denoising methods cannot achieve a balance between performance improvement and resource consumption. In this paper, we propose an Efficient Wavelet Transformer (EWT) for image denoising. Specifically, we use Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform (IWT) for downsampling and upsampling, respectively. This method can fully preserve the image features while reducing the image resolution, thereby greatly reducing the device resource consumption of the Transformer model. Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) to extract image features at different levels, which can further reduce model inference time and GPU memory usage. Experiments show that our method speeds up the original Transformer by more than 80%, reduces GPU memory usage by more than 60%, and achieves excellent denoising results. All code will be public.
Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies
Authors: Anand Gokul Mahalingam, Aayush Shah, Akshay Gulati, Royston Mascarenhas, Rakshitha Panduranga
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06277
Pdf link: https://arxiv.org/pdf/2304.06277
Abstract Improving performance in multiple domains is a challenging task, and often requires significant amounts of data to train and test models. Active learning techniques provide a promising solution by enabling models to select the most informative samples for labeling, thus reducing the amount of labeled data required to achieve high performance. In this paper, we present an active learning-based framework for improving performance across multiple domains. Our approach consists of two stages: first, we use an initial set of labeled data to train a base model, and then we iteratively select the most informative samples for labeling to refine the model. We evaluate our approach on several multi-domain datasets, including image classification, sentiment analysis, and object recognition. Our experiments demonstrate that our approach consistently outperforms baseline methods and achieves state-of-the-art performance on several datasets. We also show that our method is highly efficient, requiring significantly fewer labeled samples than other active learning-based methods. Overall, our approach provides a practical and effective solution for improving performance across multiple domains using active learning techniques.
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning
Authors: Wenli Xiao, Yiwei Lyu, John Dolan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06281
Pdf link: https://arxiv.org/pdf/2304.06281
Abstract Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.
ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis
Authors: Hongchen Tan, Baocai Yin, Kun Wei, Xiuping Liu, Xin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06297
Pdf link: https://arxiv.org/pdf/2304.06297
Abstract We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.
Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution
Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06305
Pdf link: https://arxiv.org/pdf/2304.06305
Abstract This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.
Efficient Multimodal Fusion via Interactive Prompting
Authors: Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06306
Pdf link: https://arxiv.org/pdf/2304.06306
Abstract Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era. Following this trend, the size of multi-modal learning models constantly increases, leading to an urgent need to reduce the massive computational cost of finetuning these models for downstream tasks. In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pre-trained transformers. Specifically, we first present a modular multimodal fusion framework that exhibits high flexibility and facilitates mutual interactions among different modalities. In addition, we disentangle vanilla prompts into three types in order to learn different optimizing objectives for multimodal learning. It is also worth noting that we propose to add prompt vectors only on the deep layers of the unimodal transformers, thus significantly reducing the training memory usage. Experiment results show that our proposed method achieves comparable performance to several other multimodal finetuning methods with less than 3% trainable parameters and up to 66% saving of training memory usage.
Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning
Authors: Xinyun Zhang, Lanqing Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06309
Pdf link: https://arxiv.org/pdf/2304.06309
Abstract Few-shot learning (FSL) via customization of a deep learning network with limited data has emerged as a promising technique to achieve personalized user experiences on edge devices. However, existing FSL methods primarily assume independent and identically distributed (IID) data and utilize either computational backpropagation updates for each task or a common model with task-specific prototypes. Unfortunately, the former solution is infeasible for edge devices that lack on-device backpropagation capabilities, while the latter often struggles with limited generalization ability, especially for out-of-distribution (OOD) data. This paper proposes a lightweight, plug-and-play FSL module called Task-aware Normalization (TANO) that enables efficient and task-aware adaptation of a deep neural network without backpropagation. TANO covers the properties of multiple user groups by coordinating the updates of several groups of the normalization statistics during meta-training and automatically identifies the appropriate normalization group for a downstream few-shot task. Consequently, TANO provides stable but task-specific estimations of the normalization statistics to close the distribution gaps and achieve efficient model adaptation. Results on both intra-domain and out-of-domain generalization experiments demonstrate that TANO outperforms recent methods in terms of accuracy, inference speed, and model size. Moreover, TANO achieves promising results on widely-used FSL benchmarks and data from real applications.
Universally Optimal Deterministic Broadcasting in the HYBRID Distributed Model
Authors: Yi-Jun Chang, Oren Hecht, Dean Leitersdorf
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.06317
Pdf link: https://arxiv.org/pdf/2304.06317
Abstract In theoretical computer science, it is a common practice to show existential lower bounds for problems, meaning there is a family of pathological inputs on which no algorithm can do better. However, most inputs of interest can be solved much more efficiently, giving rise to the notion of universally optimal algorithms, which run as fast as possible on every input. Questions on the existence of universally optimal algorithms were first raised by Garay, Kutten, and Peleg in FOCS '93. This research direction reemerged recently through a series of works, including the influential work of Haeupler, Wajc, and Zuzic in STOC '21, which resolves some of these decades-old questions in the supported CONGEST model. We work in the HYBRID distributed model, which analyzes networks combining both global and local communication. Much attention has recently been devoted to solving distance related problems, such as All-Pairs Shortest Paths (APSP) in HYBRID, culminating in a $\tilde \Theta(n^{1/2})$ round algorithm for exact APSP. However, by definition, every problem in HYBRID is solvable in $D$ (diameter) rounds, showing that it is far from universally optimal. We show the first universally optimal algorithms in HYBRID, by presenting a fundamental tool that solves any broadcasting problem in a universally optimal number of rounds, deterministically. Specifically, we consider the problem in a graph $G$ where a set of $k$ messages $M$ distributed arbitrarily across $G$, requires every node to learn all of $M$. We show a universal lower bound and a matching, deterministic upper bound, for any graph $G$, any value $k$, and any distribution of $M$ across $G$. This broadcasting tool opens a new exciting direction of research into showing universally optimal algorithms in HYBRID. As an example, we use it to obtain algorithms for approximate and exact APSP in general and sparse graphs.
Continual Learning of Hand Gestures for Human-Robot Interaction
Authors: Xavier Cucurull, Anaís Garrell
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06319
Pdf link: https://arxiv.org/pdf/2304.06319
Abstract In this paper, we present an efficient method to incrementally learn to classify static hand gestures. This method allows users to teach a robot to recognize new symbols in an incremental manner. Contrary to other works which use special sensors or external devices such as color or data gloves, our proposed approach makes use of a single RGB camera to perform static hand gesture recognition from 2D images. Furthermore, our system is able to incrementally learn up to 38 new symbols using only 5 samples for each old class, achieving a final average accuracy of over 90\%. In addition to that, the incremental training time can be reduced to a 10\% of the time required when using all data available.
An Automotive Case Study on the Limits of Approximation for Object Detection
Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2304.06327
Pdf link: https://arxiv.org/pdf/2304.06327
Abstract The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.
EF/CF: High Performance Smart Contract Fuzzing for Exploit Generation
Authors: Michael Rodler, David Paaßen, Wenting Li, Lukas Bernhard, Thorsten Holz, Ghassan Karame, Lucas Davi
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.06341
Pdf link: https://arxiv.org/pdf/2304.06341
Abstract Smart contracts are increasingly being used to manage large numbers of high-value cryptocurrency accounts. There is a strong demand for automated, efficient, and comprehensive methods to detect security vulnerabilities in a given contract. While the literature features a plethora of analysis methods for smart contracts, the existing proposals do not address the increasing complexity of contracts. Existing analysis tools suffer from false alarms and missed bugs in today's smart contracts that are increasingly defined by complexity and interdependencies. To scale accurate analysis to modern smart contracts, we introduce EF/CF, a high-performance fuzzer for Ethereum smart contracts. In contrast to previous work, EF/CF efficiently and accurately models complex smart contract interactions, such as reentrancy and cross-contract interactions, at a very high fuzzing throughput rate. To achieve this, EF/CF transpiles smart contract bytecode into native C++ code, thereby enabling the reuse of existing, optimized fuzzing toolchains. Furthermore, EF/CF increases fuzzing efficiency by employing a structure-aware mutation engine for smart contract transaction sequences and using a contract's ABI to generate valid transaction inputs. In a comprehensive evaluation, we show that EF/CF scales better -- without compromising accuracy -- to complex contracts compared to state-of-the-art approaches, including other fuzzers, symbolic/concolic execution, and hybrid approaches. Moreover, we show that EF/CF can automatically generate transaction sequences that exploit reentrancy bugs to steal Ether.
DDT: Dual-branch Deformable Transformer for Image Denoising
Authors: Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06346
Pdf link: https://arxiv.org/pdf/2304.06346
Abstract Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.
ODAM: Gradient-based instance-specific visual explanations for object detection
Authors: Chenyang Zhao, Antoni B. Chan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06354
Pdf link: https://arxiv.org/pdf/2304.06354
Abstract We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to both one-stage detectors and two-stage detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art both effectively and efficiently. We next propose a training scheme, Odam-Train, to improve the explanation ability on object discrimination of the detector through encouraging consistency between explanations for detections on the same object, and distinct explanations for detections on different objects. Based on the heat maps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers the information of the model's explanation for each prediction to distinguish the duplicate detected objects. We present a detailed analysis of the visualized explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM.
IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function
Authors: Shivani Bathla, Vinita Vasudevan
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06366
Pdf link: https://arxiv.org/pdf/2304.06366
Abstract Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.
An attack resilient policy on the tip pool for DAG-based distributed ledgers
Authors: Lianna Zhao, Andrew Culleny, Sebastian Muellerz, Olivia Saay, Robert Shorten
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.06369
Pdf link: https://arxiv.org/pdf/2304.06369
Abstract This paper discusses congestion control and inconsistency problems in DAG-based distributed ledgers and proposes an additional filter to mitigate these issues. Unlike traditional blockchains, DAG-based DLTs use a directed acyclic graph structure to organize transactions, allowing higher scalability and efficiency. However, this also introduces challenges in controlling the rate at which blocks are added to the network and preventing the influence of spam attacks. To address these challenges, we propose a filter to limit the tip pool size and to avoid referencing old blocks. Furthermore, we present experimental results to demonstrate the effectiveness of this filter in reducing the negative impacts of various attacks. Our approach offers a lightweight and efficient solution for managing the flow of blocks in DAG-based DLTs, which can enhance the consistency and reliability of these systems. Index
Contact Models in Robotics: a Comparative Analysis
Authors: Quentin Le Lidec, Wilson Jallet, Louis Montaut, Ivan Laptev, Cordelia Schmid, Justin Carpentier
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06372
Pdf link: https://arxiv.org/pdf/2304.06372
Abstract Physics simulation is ubiquitous in robotics. Whether in model-based approaches (e.g., trajectory optimization), or model-free algorithms (e.g., reinforcement learning), physics simulators are a central component of modern control pipelines in robotics. Over the past decades, several robotic simulators have been developed, each with dedicated contact modeling assumptions and algorithmic solutions. In this article, we survey the main contact models and the associated numerical methods commonly used in robotics for simulating advanced robot motions involving contact interactions. In particular, we recall the physical laws underlying contacts and friction (i.e., Signorini condition, Coulomb's law, and the maximum dissipation principle), and how they are transcribed in current simulators. For each physics engine, we expose their inherent physical relaxations along with their limitations due to the numerical techniques employed. Based on our study, we propose theoretically grounded quantitative criteria on which we build benchmarks assessing both the physical and computational aspects of simulation. We support our work with an open-source and efficient C++ implementation of the existing algorithmic variations. Our results demonstrate that some approximations or algorithms commonly used in robotics can severely widen the reality gap and impact target applications. We hope this work will help motivate the development of new contact models, contact solvers, and robotic simulators in general, at the root of recent progress in motion generation in robotics.
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression
Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06393
Pdf link: https://arxiv.org/pdf/2304.06393
Abstract In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
Fast And Automatic Floating Point Error Analysis With CHEF-FP
Authors: Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev
Subjects: Numerical Analysis (math.NA); Hardware Architecture (cs.AR); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2304.06441
Pdf link: https://arxiv.org/pdf/2304.06441
Abstract As we reach the limit of Moore's Law, researchers are exploring different paradigms to achieve unprecedented performance. Approximate Computing (AC), which relies on the ability of applications to tolerate some error in the results to trade-off accuracy for performance, has shown significant promise. Despite the success of AC in domains such as Machine Learning, its acceptance in High-Performance Computing (HPC) is limited due to stringent requirements for accuracy. We need tools and techniques to identify regions of code that are amenable to approximations and their impact on the application output quality to guide developers to employ selective approximation. To this end, we propose CHEF-FP, a flexible, scalable, and easy-to-use source-code transformation tool based on Automatic Differentiation (AD) for analyzing approximation errors in HPC applications. CHEF-FP uses Clad, an efficient AD tool built as a plugin to the Clang compiler and based on the LLVM compiler infrastructure, as a backend and utilizes its AD abilities to evaluate approximation errors in C++ code. CHEF-FP works at the source by injecting error estimation code into the generated adjoints. This enables the error-estimation code to undergo compiler optimizations resulting in improved analysis time and reduced memory usage. We also provide theoretical and architectural augmentations to source code transformation-based AD tools to perform FP error analysis. This paper primarily focuses on analyzing errors introduced by mixed-precision AC techniques. We also show the applicability of our tool in estimating other kinds of errors by evaluating our tool on codes that use approximate functions. Moreover, we demonstrate the speedups CHEF-FP achieved during analysis time compared to the existing state-of-the-art tool due to its ability to generate and insert approximation error estimate code directly into the derivative source.
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Authors: Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06446
Pdf link: https://arxiv.org/pdf/2304.06446
Abstract Vision transformers have been applied successfully for image recognition tasks. There have been either multi-headed self-attention based (ViT \cite{dosovitskiy2020image}, DeIT, \cite{touvron2021training}) similar to the original work in textual models or more recently based on spectral layers (Fnet\cite{lee2021fnet}, GFNet\cite{rao2021global}, AFNO\cite{guibas2021efficient}). We hypothesize that both spectral and multi-headed attention plays a major role. We investigate this hypothesis through this work and observe that indeed combining spectral and multi-headed attention layers provides a better transformer architecture. We thus propose the novel Spectformer architecture for transformers that combines spectral and multi-headed attention layers. We believe that the resulting representation allows the transformer to capture the feature representation appropriately and it yields improved performance over other transformer representations. For instance, it improves the top-1 accuracy by 2\% on ImageNet compared to both GFNet-H and LiT. SpectFormer-S reaches 84.25\% top-1 accuracy on ImageNet-1K (state of the art for small version). Further, Spectformer-L achieves 85.7\% that is the state of the art for the comparable base version of the transformers. We further ensure that we obtain reasonable results in other scenarios such as transfer learning on standard datasets such as CIFAR-10, CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigate its use in downstream tasks such of object detection and instance segmentation on the MS-COCO dataset and observe that Spectformer shows consistent performance that is comparable to the best backbones and can be further optimized and improved. Hence, we believe that combined spectral and attention layers are what are needed for vision transformers.
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
Authors: Senmao Tian, Ming Lu, Jiaming Liu, Yandong Guo, Yurong Chen, Shunli Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06454
Pdf link: https://arxiv.org/pdf/2304.06454
Abstract With the development of high-definition display devices, the practical scenario of Super-Resolution (SR) usually needs to super-resolve large input like 2K to higher resolution (4K/8K). To reduce the computational and memory cost, current methods first split the large input into local patches and then merge the SR patches into the output. These methods adaptively allocate a subnet for each patch. Quantization is a very important technique for network acceleration and has been used to design the subnets. Current methods train an MLP bit selector to determine the propoer bit for each layer. However, they uniformly sample subnets for training, making simple subnets overfitted and complicated subnets underfitted. Therefore, the trained bit selector fails to determine the optimal bit. Apart from this, the introduced bit selector brings additional cost to each layer of the SR network. In this paper, we propose a novel method named Content-Aware Bit Mapping (CABM), which can remove the bit selector without any performance loss. CABM also learns a bit selector for each layer during training. After training, we analyze the relation between the edge information of an input patch and the bit of each layer. We observe that the edge information can be an effective metric for the selected bit. Therefore, we design a strategy to build an Edge-to-Bit lookup table that maps the edge score of a patch to the bit of each layer during inference. The bit configuration of SR network can be determined by the lookup tables of all layers. Our strategy can find better bit configuration, resulting in more efficient mixed precision networks. We conduct detailed experiments to demonstrate the generalization ability of our method. The code will be released.
Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages
Authors: Israel Abebe Azime, Sana Sabah Al-Azzawi, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Oyinkansola Awosan, Oreen Yousuf
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06459
Pdf link: https://arxiv.org/pdf/2304.06459
Abstract AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter
Authors: Jonathan Lambert, Kevin Casey, Rosemary Monahan
Subjects: Programming Languages (cs.PL); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2304.06460
Pdf link: https://arxiv.org/pdf/2304.06460
Abstract Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.
Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC
Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06467
Pdf link: https://arxiv.org/pdf/2304.06467
Abstract Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.
An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification
Authors: Md. Hamjajul Ashmafee, Tasnim Ahmed, Sabbir Ahmed, Md. Bakhtiar Hasan, Mst Nura Jahan, A.B.M. Ashikur Rahman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06520
Pdf link: https://arxiv.org/pdf/2304.06520
Abstract Correct identification and categorization of plant diseases are crucial for ensuring the safety of the global food supply and the overall financial success of stakeholders. In this regard, a wide range of solutions has been made available by introducing deep learning-based classification systems for different staple crops. Despite being one of the most important commercial crops in many parts of the globe, research proposing a smart solution for automatically classifying apple leaf diseases remains relatively unexplored. This study presents a technique for identifying apple leaf diseases based on transfer learning. The system extracts features using a pretrained EfficientNetV2S architecture and passes to a classifier block for effective prediction. The class imbalance issues are tackled by utilizing runtime data augmentation. The effect of various hyperparameters, such as input resolution, learning rate, number of epochs, etc., has been investigated carefully. The competence of the proposed pipeline has been evaluated on the apple leaf disease subset from the publicly available `PlantVillage' dataset, where it achieved an accuracy of 99.21%, outperforming the existing works.
Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods
Authors: Shilei Li, Lijing Li, Dawei Shi, Yunjiang Lou, Ling Shi
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2304.06548
Pdf link: https://arxiv.org/pdf/2304.06548
Abstract This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.
Multiscale Finite Element Formulations for 2D/1D Problems
Authors: Karl Hollaus, Markus Schöbinger
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2304.06553
Pdf link: https://arxiv.org/pdf/2304.06553
Abstract Multiscale finite element methods for 2D/1D problems have been studied in this work to demonstrate their excellent ability to solve real-world problems. These methods are much more efficient than conventional 3D finite element methods and just as accurate. The 2D/1D multiscale finite element methods are based on a magnetic vector potential or a current vector potential. Known currents for excitation can be replaced by the Biot-Savart-field. Boundary conditions allow to integrate planes of symmetry. All presented approaches consider eddy currents, an insulation layer and preserve the edge effect. A segment of a fictitious electrical machine has been studied to demonstrate all above options, the accuracy and the low computational costs of the 2D/1D multiscale finite element methods.
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Authors: Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06600
Pdf link: https://arxiv.org/pdf/2304.06600
Abstract Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robotic manipulation has predominantly used frozen pretrained features, we demonstrate that in robotics this approach can fail to reach optimal performance, and that fine-tuning of the full model can lead to significantly better results. Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce "lossless adaptation" to address this shortcoming of classical fine-tuning. We demonstrate that appropriate placement of our parameter efficient adapters can significantly reduce the performance gap between frozen pretrained representations and full end-to-end fine-tuning without changes to the original representation and thus preserving original capabilities of the pretrained model. We perform a comprehensive investigation across three major model architectures (ViTs, NFNets, and ResNets), supervised (ImageNet-1K classification) and self-supervised pretrained weights (CLIP, BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstrate that our claims are strongly validated in various settings.
Robustness Measures and Monitors for Time Window Temporal Logic
Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2304.06645
Pdf link: https://arxiv.org/pdf/2304.06645
Abstract Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06648
Pdf link: https://arxiv.org/pdf/2304.06648
Abstract Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.
DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer
Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06668
Pdf link: https://arxiv.org/pdf/2304.06668
Abstract Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper.
Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms
Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.06674
Pdf link: https://arxiv.org/pdf/2304.06674
Abstract The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.
OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems
Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.06686
Pdf link: https://arxiv.org/pdf/2304.06686
Abstract We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.
Representing Volumetric Videos as Dynamic MLP Maps
Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06717
Pdf link: https://arxiv.org/pdf/2304.06717
Abstract This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.
Keyword: faster

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays
Authors: Chen Xie, Francesco Daghero, Yukai Chen, Marco Castellano, Luca Gandolfi, Andrea Calimera, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06059
Pdf link: https://arxiv.org/pdf/2304.06059
Abstract Ultra-low-resolution Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting, with applications such as occupancy monitoring. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. However, the literature was missing an extensive comparative analysis of various efficient DL architectures for IR array-based people counting, that considers not only their accuracy, but also the cost of deploying them on memory- and energy-constrained Internet of Things (IoT) edge nodes. In this work, we address this need by comparing 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array, which we made openly available. With a wide architectural exploration of each model type, we obtain a rich set of Pareto-optimal solutions, spanning cross-validated balanced accuracy scores in the 55.70-82.70% range. When deployed on a commercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, these models occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference, while consuming 17.18-120.43 $\mu$J of energy. Our models are significantly more accurate than a previous deterministic method (up to +39.9%), while being up to 3.53x faster and more energy efficient. Further, our models' accuracy is comparable to state-of-the-art DL solutions on similar resolution sensors, despite a much lower complexity. All our models enable continuous, real-time inference on a MCU-based IoT node, with years of autonomous operation without battery recharging.
Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction
Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.06178
Pdf link: https://arxiv.org/pdf/2304.06178
Abstract Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.
Beyond the Quadratic Time Barrier for Network Unreliability
Authors: Ruoxu Cen, William He, Jason Li, Debmalya Panigrahi
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2304.06552
Pdf link: https://arxiv.org/pdf/2304.06552
Abstract Karger (STOC 1995) gave the first FPTAS for the network (un)reliability problem, setting in motion research over the next three decades that obtained increasingly faster running times, eventually leading to a $\tilde{O}(n^2)$-time algorithm (Karger, STOC 2020). This represented a natural culmination of this line of work because the algorithmic techniques used can enumerate $\Theta(n^2)$ (near)-minimum cuts. In this paper, we go beyond this quadratic barrier and obtain a faster algorithm for the network unreliability problem. Our algorithm runs in $m^{1+o(1)} + \tilde{O}(n^{1.5})$ time. Our main contribution is a new estimator for network unreliability in very reliable graphs. These graphs are usually the bottleneck for network unreliability since the disconnection event is elusive. Our estimator is obtained by defining an appropriate importance sampling subroutine on a dual spanning tree packing of the graph. To complement this estimator for very reliable graphs, we use recursive contraction for moderately reliable graphs. We show that an interleaving of sparsification and contraction can be used to obtain a better parametrization of the recursive contraction algorithm that yields a faster running time matching the one obtained for the very reliable case.
Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation
Authors: Mathieu Pagé Fortin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06619
Pdf link: https://arxiv.org/pdf/2304.06619
Abstract This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.
OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems
Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.06686
Pdf link: https://arxiv.org/pdf/2304.06686
Abstract We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.
Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields
Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06706
Pdf link: https://arxiv.org/pdf/2304.06706
Abstract Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 76% lower than either prior technique, and that trains 22x faster than mip-NeRF 360.
Keyword: mobile

Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires
Authors: Xiaojian Zhang, Xilei Zhao, Yiming Xu, Ruggiero Lovreglio, Daniel Nilsson
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06233
Pdf link: https://arxiv.org/pdf/2304.06233
Abstract Real-time forecasting of travel demand during wildfire evacuations is crucial for emergency managers and transportation planners to make timely and better-informed decisions. However, few studies focus on accurate travel demand forecasting in large-scale emergency evacuations. Therefore, this study develops and tests a new methodological framework for modeling trip generation in wildfire evacuations by using (a) large-scale GPS data generated by mobile devices and (b) state-of-the-art AI technologies. The proposed methodology aims at forecasting evacuation trips and other types of trips. Based on the travel demand inferred from the GPS data, we develop a new deep learning model, i.e., Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), along with a model updating scheme to achieve real-time forecasting of travel demand during wildfire evacuations. The proposed methodological framework is tested in this study for a real-world case study: the 2019 Kincade Fire in Sonoma County, CA. The results show that SA-MGCRN significantly outperforms all the selected state-of-the-art benchmarks in terms of prediction performance. Our finding suggests that the most important model components of SA-MGCRN are evacuation order/warning information, proximity to fire, and population change, which are consistent with behavioral theories and empirical findings.
Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization
Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06264
Pdf link: https://arxiv.org/pdf/2304.06264
Abstract As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.
Gamifying Math Education using Object Detection
Authors: Yueqiu Sun, Rohitkrishna Nambiar, Vivek Vidyasagaran
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06270
Pdf link: https://arxiv.org/pdf/2304.06270
Abstract Manipulatives used in the right way help improve mathematical concepts leading to better learning outcomes. In this paper, we present a phygital (physical + digital) curriculum inspired teaching system for kids aged 5-8 to learn geometry using shape tile manipulatives. Combining smaller shapes to form larger ones is an important skill kids learn early on which requires shape tiles to be placed close to each other in the play area. This introduces a challenge of oriented object detection for densely packed objects with arbitrary orientations. Leveraging simulated data for neural network training and light-weight mobile architectures, we enable our system to understand user interactions and provide real-time audiovisual feedback. Experimental results show that our network runs real-time with high precision/recall on consumer devices, thereby providing a consistent and enjoyable learning experience.
Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution
Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06305
Pdf link: https://arxiv.org/pdf/2304.06305
Abstract This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.
Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC
Authors: Sanaz Sadat Hosseini, Mona Azarbayjani, Jason Lawrence, Hamed Tabkhi
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06467
Pdf link: https://arxiv.org/pdf/2304.06467
Abstract Access to adequate public transportation plays a critical role in inequity and socio-economic mobility, particularly in low-income communities. Low-income workers who rely heavily on public transportation face a spatial disparity between home and work, which leads to higher unemployment, longer job searches, and longer commute times. The overarching goal of this study is to get initial data that would result in creating a connected, coordinated, demand-responsive, and efficient public bus system that minimizes transit gaps for low-income, transit-dependent communities. To create equitable metropolitan public transportation, this paper evaluates existing CATS mobile applications that assist passengers in finding bus routes and arrival times. Our community survey methodology includes filling out questionnaires on Charlotte's current bus system on specific bus lines and determining user acceptance for a future novel smart technology. We have also collected data on the demand and transit gap for a real-world pilot study, Sprinter bus line, Bus line 7, Bus line 9, and Bus lines 97-99. These lines connect all of Charlotte City's main areas and are the most important bus lines in the system. On the studied routes, the primary survey results indicate that the current bus system has many flaws, the major one being the lack of proper timing to meet the needs of passengers. The most common problems are long commutes and long waiting times at stations. Moreover, the existing application provides inaccurate information, and on average, 80 percent of travelers and respondents are inclined to use new technology.
IoT-Based Water Quality Assessment System for Industrial Waste WaterHealthcare Perspective
Authors: Abdur Rab Dhruba, Kazi Nabiul Alam, Md. Shakib Khan, Sananda Saha, Mohammad Monirujjaman Khan, Mohammed Baz, Mehedi Masud, Mohammed A. AlZain
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06491
Pdf link: https://arxiv.org/pdf/2304.06491
Abstract The environment, especially water, gets polluted due to industrialization and urbanization. Pollution due to industrialization and urbanization has harmful effects on both the environment and the lives on Earth. This polluted water can cause food poisoning, diarrhea, short-term gastrointestinal problems, respiratory diseases, skin problems, and other serious health complications. In a developing country like Bangladesh, where ready-made garments sector is one of the major sources of the total Gross Domestic Product (GDP), most of the wastes released from the garment factories are dumped into the nearest rivers or canals. Hence, the quality of the water of these bodies become very incompatible for the living beings, and so, it has become one of the major threats to the environment and human health. In addition, the amount of fish in the rivers and canals in Bangladesh is decreasing day by day as a result of water pollution. Therefore, to save fish and other water animals and the environment, we need to monitor the quality of the water and find out the reasons for the pollution. Real-time monitoring of the quality of water is vital for controlling water pollution. Most of the approaches for controlling water pollution are mainly biological and lab-based, which takes a lot of time and resources. To address this issue, we developed an Internet of Things (IoT)-based real-time water quality monitoring system, integrated with a mobile application. The proposed system in this research measures some of the most important indexes of water, including the potential of hydrogen (pH), total dissolved solids (TDS), and turbidity, and temperature of water. The proposed system results will be very helpful in saving the environment, and thus, improving the health of living creatures on Earth.
IoT-Based Remote Health Monitoring System Employing Smart Sensors for Asthma Patients during COVID-19 Pandemic
Authors: Nafisa Shamim Rafa, Basma Binte Azmal, Abdur Rab Dhruba, Mohammad Monirujjaman Khan, Turki M. Alanazi, Faris A. Almalki, Othman AlOmeir
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06511
Pdf link: https://arxiv.org/pdf/2304.06511
Abstract COVID19 and asthma are respiratory diseases that can be life threatening in uncontrolled circumstances and require continuous monitoring. A poverty stricken South Asian country like Bangladesh has been bearing the brunt of the COVID19 pandemic since its beginning. The majority of the country's population resides in rural areas, where proper healthcare is difficult to access. This emphasizes the necessity of telemedicine, implementing the concept of the Internet of Things (IoT), which is still under development in Bangladesh. This paper demonstrates how the current challenges in the healthcare system are resolvable through the design of a remote health and environment monitoring system, specifically for asthma patients who are at an increased risk of COVID19. Since on-time treatment is essential, this system will allow doctors and medical staff to receive patient information in real time and deliver their services immediately to the patient regardless of their location. The proposed system consists of various sensors collecting heart rate, body temperature, ambient temperature, humidity, and air quality data and processing them through the Arduino Microcontroller. It is integrated with a mobile application. All this data is sent to the mobile application via a Bluetooth module and updated every few seconds so that the medical staff can instantly track patients' conditions and emergencies. The developed prototype is portable and easily usable by anyone. The system has been applied to five people of different ages and medical histories over a particular period. Upon analyzing all their data, it became clear which participants were particularly vulnerable to health deterioration and needed constant observation. Through this research, awareness about asthmatic symptoms will improve and help prevent their severity through effective treatment anytime, anywhere.
Keyword: pruning

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution
Authors: Zhuo Su, Jiehua Zhang, Tianpeng Liu, Zhen Liu, Shuanghui Zhang, Matti Pietikäinen, Li Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06305
Pdf link: https://arxiv.org/pdf/2304.06305
Abstract This paper proposes a novel module called middle spectrum grouped convolution (MSGC) for efficient deep convolutional neural networks (DCNNs) with the mechanism of grouped convolution. It explores the broad "middle spectrum" area between channel pruning and conventional grouped convolution. Compared with channel pruning, MSGC can retain most of the information from the input feature maps due to the group mechanism; compared with grouped convolution, MSGC benefits from the learnability, the core of channel pruning, for constructing its group topology, leading to better channel division. The middle spectrum area is unfolded along four dimensions: group-wise, layer-wise, sample-wise, and attention-wise, making it possible to reveal more powerful and interpretable structures. As a result, the proposed module acts as a booster that can reduce the computational cost of the host backbones for general image recognition with even improved predictive accuracy. For example, in the experiments on ImageNet dataset for image classification, MSGC can reduce the multiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but still increase the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGC can also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MS COCO dataset for object detection show similar observations. Our code and trained models are available at https://github.com/hellozhuo/msgc.
Learning Accurate Performance Predictors for Ultrafast Automated Model Compression
Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06393
Pdf link: https://arxiv.org/pdf/2304.06393
Abstract In this paper, we propose an ultrafast automated model compression framework called SeerNet for flexible network deployment. Conventional non-differen-tiable methods discretely search the desirable compression policy based on the accuracy from exhaustively trained lightweight models, and existing differentiable methods optimize an extremely large supernet to obtain the required compressed model for deployment. They both cause heavy computational cost due to the complex compression policy search and evaluation process. On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation. Specifically, we first train the performance predictor based on the accuracy from uncertain compression policies actively selected by efficient evolutionary search, so that informative supervision is provided to learn the accurate performance predictor with acceptable cost. Then we leverage the gradient that maximizes the predicted performance under the barrier complexity constraint for ultrafast acquisition of the desirable compression policy, where adaptive update stepsizes with momentum are employed to enhance optimality of the acquired pruning and quantization strategy. Compared with the state-of-the-art automated model compression methods, experimental results on image classification and object detection show that our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
Keyword: voxel

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI
Authors: Axel Elaldi, Guido Gerig, Neel Dey
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06103
Pdf link: https://arxiv.org/pdf/2304.06103
Abstract We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv
Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction
Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.06178
Pdf link: https://arxiv.org/pdf/2304.06178
Abstract Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.
Brain Structure Ages -- A new biomarker for multi-disease classification
Authors: Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06591
Pdf link: https://arxiv.org/pdf/2304.06591
Abstract Age is an important variable to describe the expected brain's anatomy status across the normal aging trajectory. The deviation from that normative aging trajectory may provide some insights into neurological diseases. In neuroimaging, predicted brain age is widely used to analyze different diseases. However, using only the brain age gap information (\ie the difference between the chronological age and the estimated age) can be not enough informative for disease classification problems. In this paper, we propose to extend the notion of global brain age by estimating brain structure ages using structural magnetic resonance imaging. To this end, an ensemble of deep learning models is first used to estimate a 3D aging map (\ie voxel-wise age estimation). Then, a 3D segmentation mask is used to obtain the final brain structure ages. This biomarker can be used in several situations. First, it enables to accurately estimate the brain age for the purpose of anomaly detection at the population level. In this situation, our approach outperforms several state-of-the-art methods. Second, brain structure ages can be used to compute the deviation from the normal aging process of each brain structure. This feature can be used in a multi-disease classification task for an accurate differential diagnosis at the subject level. Finally, the brain structure age deviations of individuals can be visualized, providing some insights about brain abnormality and helping clinicians in real medical contexts.
Keyword: lidar

Survey on LiDAR Perception in Adverse Weather Conditions
Authors: Mariella Dreissig, Dominik Scheuble, Florian Piewak, Joschka Boedecker
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06312
Pdf link: https://arxiv.org/pdf/2304.06312
Abstract Autonomous vehicles rely on a variety of sensors to gather information about their surrounding. The vehicle's behavior is planned based on the environment perception, making its reliability crucial for safety reasons. The active LiDAR sensor is able to create an accurate 3D representation of a scene, making it a valuable addition for environment perception for autonomous vehicles. Due to light scattering and occlusion, the LiDAR's performance change under adverse weather conditions like fog, snow or rain. This limitation recently fostered a large body of research on approaches to alleviate the decrease in perception performance. In this survey, we gathered, analyzed, and discussed different aspects on dealing with adverse weather conditions in LiDAR-based environment perception. We address topics such as the availability of appropriate data, raw point cloud processing and denoising, robust perception algorithms and sensor fusion to mitigate adverse weather induced shortcomings. We furthermore identify the most pressing gaps in the current literature and pinpoint promising research directions.
An Automotive Case Study on the Limits of Approximation for Object Detection
Authors: Martí Caro, Hamid Tabani, Jaume Abella, Francesc Moll, Enric Morancho, Ramon Canal, Josep Altet, Antonio Calomarde, Francisco J. Cazorla, Antonio Rubio, Pau Fontova, Jordi Fornt
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2304.06327
Pdf link: https://arxiv.org/pdf/2304.06327
Abstract The accuracy of camera-based object detection (CBOD) built upon deep learning is often evaluated against the real objects in frames only. However, such simplistic evaluation ignores the fact that many unimportant objects are small, distant, or background, and hence, their misdetections have less impact than those for closer, larger, and foreground objects in domains such as autonomous driving. Moreover, sporadic misdetections are irrelevant since confidence on detections is typically averaged across consecutive frames, and detection devices (e.g. cameras, LiDARs) are often redundant, thus providing fault tolerance. This paper exploits such intrinsic fault tolerance of the CBOD process, and assesses in an automotive case study to what extent CBOD can tolerate approximation coming from multiple sources such as lower precision arithmetic, approximate arithmetic units, and even random faults due to, for instance, low voltage operation. We show that the accuracy impact of those sources of approximation is within 1% of the baseline even when considering the three approximate domains simultaneously, and hence, multiple sources of approximation can be exploited to build highly efficient accelerators for CBOD in cars.
RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception
Authors: Felix Fent, Philipp Bauerschmidt, Markus Lienkamp
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06547
Pdf link: https://arxiv.org/pdf/2304.06547
Abstract A reliable perception has to be robust against challenging environmental conditions. Therefore, recent efforts focused on the use of radar sensors in addition to camera and lidar sensors for perception applications. However, the sparsity of radar point clouds and the poor data availability remain challenging for current perception methods. To address these challenges, a novel graph neural network is proposed that does not just use the information of the points themselves but also the relationships between the points. The model is designed to consider both point features and point-pair features, embedded in the edges of the graph. Furthermore, a general approach for achieving transformation invariance is proposed which is robust against unseen scenarios and also counteracts the limited data availability. The transformation invariance is achieved by an invariant data representation rather than an invariant model architecture, making it applicable to other methods. The proposed RadarGNN model outperforms all previous methods on the RadarScenes dataset. In addition, the effects of different invariances on the object detection and semantic segmentation quality are investigated. The code is made available as open-source software under https://github.com/TUMFTM/RadarGNN.
Keyword: diffusion

Social Biases through the Text-to-Image Generation Lens
Authors: Ranjita Naik, Besmira Nushi
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06034
Pdf link: https://arxiv.org/pdf/2304.06034
Abstract Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software by generating illustrative content with high photorealism starting from a given descriptive text as a prompt. Such models are however trained on massive amounts of web data, which surfaces the peril of potential harmful biases that may leak in the generation process itself. In this paper, we take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images, by focusing on how occupations, personality traits, and everyday situations are depicted across representations of (perceived) gender, age, race, and geographical location. Through an extensive set of both automated and human evaluation experiments we present findings for two popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal that there exist severe occupational biases of neutral prompts majorly excluding groups of people from results for both models. Such biases can get mitigated by increasing the amount of specification in the prompt itself, although the prompting mitigation will not address discrepancies in image quality or other usages of the model or its representations in other scenarios. Further, we observe personality traits being associated with only a limited set of people at the intersection of race, gender, and age. Finally, an analysis of geographical location representations on everyday situations (e.g., park, food, weddings) shows that for most situations, images generated through default location-neutral prompts are closer and more similar to images generated for locations of United States and Germany.
$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI
Authors: Axel Elaldi, Guido Gerig, Neel Dey
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06103
Pdf link: https://arxiv.org/pdf/2304.06103
Abstract We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world \textit{in vivo} human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. Our implementation is available at https://github.com/AxelElaldi/e3so3_conv
PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting
Authors: Saman Motamed, Jianjin Xu, Chen Henry Wu, Fernando De la Torre
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06107
Pdf link: https://arxiv.org/pdf/2304.06107
Abstract Generative models such as StyleGAN2 and Stable Diffusion have achieved state-of-the-art performance in computer vision tasks such as image synthesis, inpainting, and de-noising. However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. Our proposed method, PATMAT, effectively preserves identity by incorporating reference images of a subject and fine-tuning a MAT architecture trained on faces. By using ~40 reference images, PATMAT creates anchor points in MAT's style module, and tunes the model using the fixed anchors to adapt the model to a new face identity. Moreover, PATMAT's use of multiple images per anchor during training allows the model to use fewer reference images than competing methods. We demonstrate that PATMAT outperforms state-of-the-art models in terms of image quality, the preservation of person-specific details, and the identity of the subject. Our results suggest that PATMAT can be a promising approach for improving the quality of personalized face inpainting.
An Edit Friendly DDPM Noise Space: Inversion and Manipulations
Authors: Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06140
Pdf link: https://arxiv.org/pdf/2304.06140
Abstract Denoising diffusion probabilistic models (DDPMs) employ a sequence of white Gaussian noise samples to generate an image. In analogy with GANs, those noise maps could be considered as the latent code associated with the generated image. However, this native noise space does not possess a convenient structure, and is thus challenging to work with in editing tasks. Here, we propose an alternative latent noise space for DDPM that enables a wide range of editing operations via simple means, and present an inversion method for extracting these edit-friendly noise maps for any given image (real or synthetically generated). As opposed to the native DDPM noise space, the edit-friendly noise maps do not have a standard normal distribution and are not statistically independent across timesteps. However, they allow perfect reconstruction of any desired image, and simple transformations on them translate into meaningful manipulations of the output image (e.g., shifting, color edits). Moreover, in text-conditional models, fixing those noise maps while changing the text prompt, modifies semantics while retaining structure. We illustrate how this property enables text-based editing of real images via the diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIM inversion). We also show how it can be used within existing diffusion-based editing methods to improve their quality and diversity.
Intriguing properties of synthetic images: from generative adversarial networks to diffusion models
Authors: Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06408
Pdf link: https://arxiv.org/pdf/2304.06408
Abstract Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake images from real ones. In this paper we report on our systematic study of a large number of image generators of different families, aimed at discovering the most forensically relevant characteristics of real and generated images. Our experiments provide a number of interesting observations and shed light on some intriguing properties of synthetic images: (1) not only the GAN models but also the DM and VQ-GAN (Vector Quantized Generative Adversarial Networks) models give rise to visible artifacts in the Fourier domain and exhibit anomalous regular patterns in the autocorrelation; (2) when the dataset used to train the model lacks sufficient variety, its biases can be transferred to the generated images; (3) synthetic and real images exhibit significant differences in the mid-high frequency signal content, observable in their radial and angular spectral power distributions.
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Authors: Enze Xie, Lewei Yao, Han Shi, Zhili Liu, Daquan Zhou, Zhaoqiang Liu, Jiawei Li, Zhenguo Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06648
Pdf link: https://arxiv.org/pdf/2304.06648
Abstract Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.
Learning Controllable 3D Diffusion Models from Single-view Images
Authors: Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, Josh Susskind
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06700
Pdf link: https://arxiv.org/pdf/2304.06700
Abstract Diffusion models have recently become the de-facto approach for generative modeling in the 2D domain. However, extending diffusion models to 3D is challenging due to the difficulties in acquiring 3D ground truth data for training. On the other hand, 3D GANs that integrate implicit 3D representations into GANs have shown remarkable 3D-aware generation when trained only on single-view image datasets. However, 3D GANs do not provide straightforward ways to precisely control image synthesis. To address these challenges, We present Control3Diff, a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis for single-view datasets. Control3Diff explicitly models the underlying latent distribution (optionally conditioned on external inputs), thus enabling direct control during the diffusion process. Moreover, our approach is general and applicable to any type of controlling input, allowing us to train it with the same diffusion objective without any auxiliary supervision. We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet, using various conditioning inputs such as images, sketches, and text prompts. Please see the project website (\url{https://jiataogu.me/control3diff}) for video comparisons.
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
Authors: Zheng Ding, Xuaner Zhang, Zhihao Xia, Lars Jebe, Zhuowen Tu, Xiuming Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.06711
Pdf link: https://arxiv.org/pdf/2304.06711
Abstract We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction
Authors: Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06714
Pdf link: https://arxiv.org/pdf/2304.06714
Abstract 3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.
Expressive Text-to-Image Generation with Rich Text
Authors: Songwei Ge, Taesung Park, Jun-Yan Zhu, Jia-Bin Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06720
Pdf link: https://arxiv.org/pdf/2304.06720
Abstract Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on cross-attention maps of a vanilla diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.
Keyword: dynamic

Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders
Authors: Georgina Curto, Flavio Comim
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06031
Pdf link: https://arxiv.org/pdf/2304.06031
Abstract This paper clarifies why bias cannot be completely mitigated in Machine Learning (ML) and proposes an end-to-end methodology to translate the ethical principle of justice and fairness into the practice of ML development as an ongoing agreement with stakeholders. The pro-ethical iterative process presented in the paper aims to challenge asymmetric power dynamics in the fairness decision making within ML design and support ML development teams to identify, mitigate and monitor bias at each step of ML systems development. The process also provides guidance on how to explain the always imperfect trade-offs in terms of bias to users.
Web 3.0: The Future of Internet
Authors: Wensheng Gan, Zhenqiang Ye, Shicheng Wan, Philip S. Yu
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.06032
Pdf link: https://arxiv.org/pdf/2304.06032
Abstract With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scholars and engineers have worked hard to make the internet world more open, inclusive, and equal. Indeed, the next generation of Web evolution (i.e., Web 3.0) is already coming and shaping our lives. Web 3.0 is a decentralized Web architecture that is more intelligent and safer than before. The risks and ruin posed by monopolists or criminals will be greatly reduced by a complete reconstruction of the Internet and IT infrastructure. In a word, Web 3.0 is capable of addressing web data ownership according to distributed technology. It will optimize the internet world from the perspectives of economy, culture, and technology. Then it promotes novel content production methods, organizational structures, and economic forms. However, Web 3.0 is not mature and is now being disputed. Herein, this paper presents a comprehensive survey of Web 3.0, with a focus on current technologies, challenges, opportunities, and outlook. This article first introduces a brief overview of the history of World Wide Web as well as several differences among Web 1.0, Web 2.0, Web 3.0, and Web3. Then, some technical implementations of Web 3.0 are illustrated in detail. We discuss the revolution and benefits that Web 3.0 brings. Finally, we explore several challenges and issues in this promising area.
Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN
Authors: Shahed Rezaei, Ahmad Moeineddin, Ali Harandi
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06044
Pdf link: https://arxiv.org/pdf/2304.06044
Abstract We applied physics-informed neural networks to solve the constitutive relations for nonlinear, path-dependent material behavior. As a result, the trained network not only satisfies all thermodynamic constraints but also instantly provides information about the current material state (i.e., free energy, stress, and the evolution of internal variables) under any given loading scenario without requiring initial data. One advantage of this work is that it bypasses the repetitive Newton iterations needed to solve nonlinear equations in complex material models. Additionally, strategies are provided to reduce the required order of derivation for obtaining the tangent operator. The trained model can be directly used in any finite element package (or other numerical methods) as a user-defined material model. However, challenges remain in the proper definition of collocation points and in integrating several non-equality constraints that become active or non-active simultaneously. We tested this methodology on rate-independent processes such as the classical von Mises plasticity model with a nonlinear hardening law, as well as local damage models for interface cracking behavior with a nonlinear softening law. Finally, we discuss the potential and remaining challenges for future developments of this new approach.
Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints
Authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.06104
Pdf link: https://arxiv.org/pdf/2304.06104
Abstract This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.
IoT trust and reputation: a survey and taxonomy
Authors: Muhammad Aaqib, Aftab Ali, Liming Chen, Omar Nibouche
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.06119
Pdf link: https://arxiv.org/pdf/2304.06119
Abstract IoT is one of the fastest-growing technologies and it is estimated that more than a billion devices would be utilized across the globe by the end of 2030. To maximize the capability of these connected entities, trust and reputation among IoT entities is essential. Several trust management models have been proposed in the IoT environment; however, these schemes have not fully addressed the IoT devices features, such as devices role, device type and its dynamic behavior in a smart environment. As a result, traditional trust and reputation models are insufficient to tackle these characteristics and uncertainty risks while connecting nodes to the network. Whilst continuous study has been carried out and various articles suggest promising solutions in constrained environments, research on trust and reputation is still at its infancy. In this paper, we carry out a comprehensive literature review on state-of-the-art research on the trust and reputation of IoT devices and systems. Specifically, we first propose a new structure, namely a new taxonomy, to organize the trust and reputation models based on the ways trust is managed. The proposed taxonomy comprises of traditional trust management-based systems and artificial intelligence-based systems, and combine both the classes which encourage the existing schemes to adapt these emerging concepts. This collaboration between the conventional mathematical and the advanced ML models result in design schemes that are more robust and efficient. Then we drill down to compare and analyse the methods and applications of these systems based on community-accepted performance metrics, e.g. scalability, delay, cooperativeness and efficiency. Finally, built upon the findings of the analysis, we identify and discuss open research issues and challenges, and further speculate and point out future research directions.
Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture Commands
Authors: Rui Chen, Alvin Shek, Changliu Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06175
Pdf link: https://arxiv.org/pdf/2304.06175
Abstract This paper studies real-time collaborative robot (cobot) handling, where the cobot maneuvers an object under human dynamic gesture commands. Enabling dynamic gesture commands is useful when the human needs to avoid direct contact with the robot or the object handled by the robot. However, the key challenge lies in the heterogeneity in human behaviors and the stochasticity in the perception of dynamic gestures, which requires the robot handling policy to be adaptable and robust. To address these challenges, we introduce Conditional Collaborative Handling Process (CCHP) to encode a contextaware cobot handling policy and a procedure to learn such policy from human-human collaboration. We thoroughly evaluate the adaptability and robustness of CCHP and apply our approach to a real-time cobot assembly task with Kinova Gen3 robot arm. Results show that our method leads to significantly less human effort and smoother human-robot collaboration than state-of-the-art rule-based approach even with first-time users.
Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction
Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.06178
Pdf link: https://arxiv.org/pdf/2304.06178
Abstract Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth observations. Rather than treating each voxel equally, we optimize the process by dynamically modifying the grid and assigning more finer-scale voxels to regions with higher complexity, allowing us to capture more intricate details. Furthermore, we develop a scheme to quantify the dynamic subdivision of voxel grid during optimization without requiring any priors. The proposed approach is able to generate high-quality 3D reconstructions with fine details on both synthetic and real-world data, while maintaining computational efficiency, which is substantially faster than the baseline method NeuralRGBD.
Do "bad" citations have "good" effects?
Authors: Honglin Bao, Misha Teplitskiy
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Multiagent Systems (cs.MA); Adaptation and Self-Organizing Systems (nlin.AO)
Arxiv link: https://arxiv.org/abs/2304.06190
Pdf link: https://arxiv.org/pdf/2304.06190
Abstract The scientific community generally discourages authors of research papers from citing papers that did not influence them because such "rhetorical" citations are assumed to degrade the literature and incentives for good work. Intuitively, a world where authors cite only substantively appears attractive. We argue that manding substantive citing may have underappreciated consequences on the allocation of attention and dynamism. We develop a novel agent-based model in which agents cite substantively and rhetorically. Agents first select papers to read based on their expected quality, read them and observe their actual quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in the reference lists with papers that support their claims, regardless of whether they were actually influential. By turning rhetorical citing on-and-off, we find that rhetorical citing increases the correlation between quality and citations, increases citation churn, and reduces citation inequality. This occurs because rhetorical citing redistributes some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplifies the effects. In sum, rhetorical citing helps deconcentrate attention and makes it easier to displace incumbent ideas, so whether it is indeed undesirable depends on the metrics used to judge desirability.
Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems
Authors: Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.06193
Pdf link: https://arxiv.org/pdf/2304.06193
Abstract This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.
Sub-Optimal Moving Horizon Estimation in Feedback Control of Linear Constrained Systems
Authors: Yujia Yang, Chris Manzie, Ye Pu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.06216
Pdf link: https://arxiv.org/pdf/2304.06216
Abstract Moving horizon estimation (MHE) offers benefits relative to other estimation approaches by its ability to explicitly handle constraints, but suffers increased computation cost. To help enable MHE on platforms with limited computation power, we propose to solve the optimization problem underlying MHE sub-optimally for a fixed number of optimization iterations per time step. The stability of the closed-loop system is analyzed using the small-gain theorem by considering the closed-loop controlled system, the optimization algorithm dynamics, and the estimation error dynamics as three interconnected subsystems. By assuming incremental input/output-to-state stability ({\delta}- IOSS) of the system and imposing standard ISS conditions on the controller, we derive conditions on the iteration number such that the interconnected system is input-to-state stable (ISS) w.r.t. the external disturbances. A simulation using an MHE- MPC estimator-controller pair is used to validate the results.
Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs
Authors: Jinshuai Bai, Gui-Rong Liu, Ashish Gupta, Laith Alzubaidi, Xi-Qiao Feng, YuanTong Gu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.06234
Pdf link: https://arxiv.org/pdf/2304.06234
Abstract Our recent intensive study has found that physics-informed neural networks (PINN) tend to be local approximators after training. This observation leads to this novel physics-informed radial basis network (PIRBN), which can maintain the local property throughout the entire training process. Compared to deep neural networks, a PIRBN comprises of only one hidden layer and a radial basis "activation" function. Under appropriate conditions, we demonstrated that the training of PIRBNs using gradient descendent methods can converge to Gaussian processes. Besides, we studied the training dynamics of PIRBN via the neural tangent kernel (NTK) theory. In addition, comprehensive investigations regarding the initialisation strategies of PIRBN were conducted. Based on numerical examples, PIRBN has been demonstrated to be more effective and efficient than PINN in solving PDEs with high-frequency features and ill-posed computational domains. Moreover, the existing PINN numerical techniques, such as adaptive learning, decomposition and different types of loss functions, are applicable to PIRBN. The programs that can regenerate all numerical results can be found at https://github.com/JinshuaiBai/PIRBN.
Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization
Authors: Xianjia Yu, Paola Torrico Morrn, Sahar Salimpour, Jorge Pena Queralta, Tomi Westerlund
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06264
Pdf link: https://arxiv.org/pdf/2304.06264
Abstract As mobile robots become more ubiquitous, their deployments grow across use cases where GNSS positioning is either unavailable or unreliable. This has led to increased interest in multi-modal relative localization methods. Complementing onboard odometry, ranging allows for relative state estimation, with ultra-wideband (UWB) ranging having gained widespread recognition due to its low cost and centimeter-level out-of-box accuracy. Infrastructure-free localization methods allow for more dynamic, ad-hoc, and flexible deployments, yet they have received less attention from the research community. In this work, we propose a cooperative relative multi-robot localization where we leverage inter-robot ranging and simultaneous spatial detections of objects in the environment. To achieve this, we equip robots with a single UWB transceiver and a stereo camera. We propose a novel Monte-Carlo approach to estimate relative states by either employing only UWB ranges or dynamically integrating simultaneous spatial detections from the stereo cameras. We also address the challenges for UWB ranging error mitigation, especially in non-line-of-sight, with a study on different LSTM networks to estimate the ranging error. The proposed approach has multiple benefits. First, we show that a single range is enough to estimate the accurate relative states of two robots when fusing odometry measurements. Second, our experiments also demonstrate that our approach surpasses traditional methods such as multilateration in terms of accuracy. Third, to increase accuracy even further, we allow for the integration of cooperative spatial detections. Finally, we show how ROS 2 and Zenoh can be integrated to build a scalable wireless communication solution for multi-robot systems. The experimental validation includes real-time deployment and autonomous navigation based on the relative positioning method.
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning
Authors: Wenli Xiao, Yiwei Lyu, John Dolan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.06281
Pdf link: https://arxiv.org/pdf/2304.06281
Abstract Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Although shielding with Linear Temporal Logic (LTL) is a promising formal method to ensure safety in single-agent Reinforcement Learning (RL), it results in conservative behaviors when scaling to multi-agent scenarios. Additionally, it poses computational challenges for synthesizing shields in complex multi-agent environments. This work introduces Model-based Dynamic Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes distributive shields, which are reactive systems running in parallel with each MARL agent, to monitor and rectify unsafe behaviors. The shields can dynamically split, merge, and recompute based on agents' states. This design enables efficient synthesis of shields to monitor agents in complex environments without coordination overheads. We also propose an algorithm to synthesize shields without prior knowledge of the dynamics model. The proposed algorithm obtains an approximate world model by interacting with the environment during the early stage of exploration, making our MBDS enjoy formal safety guarantees with high probability. We demonstrate in simulations that our framework can surpass existing baselines in terms of safety guarantees and learning performance.
Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification
Authors: Marco Forgione, Dario Piga
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.06349
Pdf link: https://arxiv.org/pdf/2304.06349
Abstract Effective quantification of uncertainty is an essential and still missing step towards a greater adoption of deep-learning approaches in different applications, including mission-critical ones. In particular, investigations on the predictive uncertainty of deep-learning models describing non-linear dynamical systems are very limited to date. This paper is aimed at filling this gap and presents preliminary results on uncertainty quantification for system identification with neural state-space models. We frame the learning problem in a Bayesian probabilistic setting and obtain posterior distributions for the neural network's weights and outputs through approximate inference techniques. Based on the posterior, we construct credible intervals on the outputs and define a surprise index which can effectively diagnose usage of the model in a potentially dangerous out-of-distribution regime, where predictions cannot be trusted.
Emergence of Symbols in Neural Networks for Semantic Understanding and Communication
Authors: Yang Chen, Liangxuan Guo, Shan Yu
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Symbolic Computation (cs.SC); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2304.06377
Pdf link: https://arxiv.org/pdf/2304.06377
Abstract Being able to create meaningful symbols and proficiently use them for higher cognitive functions such as communication, reasoning, planning, etc., is essential and unique for human intelligence. Current deep neural networks are still far behind human's ability to create symbols for such higher cognitive functions. Here we propose a solution, named SEA-net, to endow neural networks with ability of symbol creation, semantic understanding and communication. SEA-net generates symbols that dynamically configure the network to perform specific tasks. These symbols capture compositional semantic information that enables the system to acquire new functions purely by symbolic manipulation or communication. In addition, we found that these self-generated symbols exhibit an intrinsic structure resembling that of natural language, suggesting a common framework underlying the generation and understanding of symbols in both human brains and artificial neural networks. We hope that it will be instrumental in producing more capable systems in the future that can synergize the strengths of connectionist and symbolic approaches for AI.
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Authors: Diandian Gu, Xintong Xie, Gang Huang, Xin Jin, Xuanzhe Liu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.06381
Pdf link: https://arxiv.org/pdf/2304.06381
Abstract Training deep neural networks (DNNs) is a major workload in datacenters today, resulting in a tremendously fast growth of energy consumption. It is important to reduce the energy consumption while completing the DL training jobs early in data centers. In this paper, we propose PowerFlow, a GPU clusters scheduler that reduces the average Job Completion Time (JCT) under an energy budget. We first present performance models for DL training jobs to predict the throughput and energy consumption performance with different configurations. Based on the performance models, PowerFlow dynamically allocates GPUs and adjusts the GPU-level or job-level configurations of DL training jobs. PowerFlow applies network packing and buddy allocation to job placement, thus avoiding extra energy consumed by cluster fragmentations. Evaluation results show that under the same energy consumption, PowerFlow improves the average JCT by 1.57 - 3.39 x at most, compared to competitive baselines.
TransHP: Image Classification with Hierarchical Prompting
Authors: Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06385
Pdf link: https://arxiv.org/pdf/2304.06385
Abstract This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination. We think it well imitates human visual recognition, i.e., humans may use the ancestor class as a prompt to draw focus on the subtle differences among descendant classes. We model this prompting mechanism into a Transformer with Hierarchical Prompting (TransHP). TransHP consists of three steps: 1) learning a set of prompt tokens to represent the coarse (ancestor) classes, 2) on-the-fly predicting the coarse class of the input image at an intermediate block, and 3) injecting the prompt token of the predicted coarse class into the intermediate feature. Though the parameters of TransHP maintain the same for all input images, the injected coarse-class prompt conditions (modifies) the subsequent feature extraction and encourages a dynamic focus on relatively subtle differences among the descendant classes. Extensive experiments show that TransHP improves image classification on accuracy (e.g., improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and model explainability. Moreover, TransHP also performs favorably against prior HIC methods, showing that TransHP well exploits the hierarchical information.
Communicating Actor Automata -- Modelling Erlang Processes as Communicating Machines
Authors: Dominic Orchard (University of Kent, UK), Mihail Munteanu (Masabi Ltd.), Paulo Torrens (University of Kent, UK)
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2304.06395
Pdf link: https://arxiv.org/pdf/2304.06395
Abstract Brand and Zafiropulo's notion of Communicating Finite-State Machines (CFSMs) provides a succinct and powerful model of message-passing concurrency, based around channels. However, a major variant of message-passing concurrency is not readily captured by CFSMs: the actor model. In this work, we define a variant of CFSMs, called Communicating Actor Automata, to capture the actor model of concurrency as provided by Erlang: with mailboxes, from which messages are received according to repeated application of pattern matching. Furthermore, this variant of CFSMs supports dynamic process topologies, capturing common programming idioms in the context of actor-based message-passing concurrency. This gives a new basis for modelling, specifying, and verifying Erlang programs. We also consider a class of CAAs that give rise to freedom from race conditions.
Event-based tracking of human hands
Authors: Laura Duarte, Mohammad Safeea, Pedro Neto
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06534
Pdf link: https://arxiv.org/pdf/2304.06534
Abstract This paper proposes a novel method for human hands tracking using data from an event camera. The event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range. Captured frames are analysed using lightweight algorithms reporting 3D hand position data. The chosen pick-and-place scenario serves as an example input for collaborative human-robot interactions and in obstacle avoidance for human-robot safety applications. Events data are pre-processed into intensity frames. The regions of interest (ROI) are defined through object edge event activity, reducing noise. ROI features are extracted for use in-depth perception. Event-based tracking of human hand demonstrated feasible, in real time and at a low computational cost. The proposed ROI-finding method reduces noise from intensity images, achieving up to 89% of data reduction in relation to the original, while preserving the features. The depth estimation error in relation to ground truth (measured with wearables), measured using dynamic time warping and using a single event camera, is from 15 to 30 millimetres, depending on the plane it is measured. Tracking of human hands in 3D space using a single event camera data and lightweight algorithms to define ROI features (hands tracking in space).
DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
Authors: Qi Zhao, M. Salman Asif, Zhan Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06544
Pdf link: https://arxiv.org/pdf/2304.06544
Abstract Existing implicit neural representation (INR) methods do not fully exploit spatiotemporal redundancies in videos. Index-based INRs ignore the content-specific spatial features and hybrid INRs ignore the contextual dependency on adjacent frames, leading to poor modeling capability for scenes with large motion or dynamics. We analyze this limitation from the perspective of function fitting and reveal the importance of frame difference. To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. We also introduce a collaborative content unit for effective feature fusion. We test DNeRV for video compression, inpainting, and interpolation. DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.
Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation
Authors: Mathieu Pagé Fortin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06619
Pdf link: https://arxiv.org/pdf/2304.06619
Abstract This paper investigates the problem of class-incremental object detection for agricultural applications where a model needs to learn new plant species and diseases incrementally without forgetting the previously learned ones. We adapt two public datasets to include new categories over time, simulating a more realistic and dynamic scenario. We then compare three class-incremental learning methods that leverage different forms of knowledge distillation to mitigate catastrophic forgetting. Our experiments show that all three methods suffer from catastrophic forgetting, but the recent Dynamic Y-KD approach, which additionally uses a dynamic architecture that grows new branches to learn new tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new and old classes. These results highlight the challenges and opportunities of continual object detection for agricultural applications. In particular, the large intra-class and small inter-class variability that is typical of plant images exacerbate the difficulty of learning new categories without interfering with previous knowledge. We publicly release our code to encourage future work.
Robustness Measures and Monitors for Time Window Temporal Logic
Authors: Ahmad Ahmad, Cristian-Ioan Vasile, Roberto Tron, Calin Belta
Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2304.06645
Pdf link: https://arxiv.org/pdf/2304.06645
Abstract Temporal logics (TLs) have been widely used to formalize interpretable tasks for cyber-physical systems. Time Window Temporal Logic (TWTL) has been recently proposed as a specification language for dynamical systems. In particular, it can easily express robotic tasks, and it allows for efficient, automata-based verification and synthesis of control policies for such systems. In this paper, we define two quantitative semantics for this logic, and two corresponding monitoring algorithms, which allow for real-time quantification of satisfaction of formulas by trajectories of discrete-time systems. We demonstrate the new semantics and their runtime monitors on numerical examples.
ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification
Authors: Rui Yang, Pei Liu, Luping Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06652
Pdf link: https://arxiv.org/pdf/2304.06652
Abstract Due to the limitations of inadequate Whole-Slide Image (WSI) samples with weak labels, pseudo-bag-based multiple instance learning (MIL) appears as a vibrant prospect in WSI classification. However, the pseudo-bag dividing scheme, often crucial for classification performance, is still an open topic worth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, using a bag prototype to guide the division of WSI pseudo-bags. Rather than designing complex network architecture, this scheme takes a plugin-and-play approach to safely augment WSI data for effective training while preserving sample consistency. Furthermore, we specially devise an attention-based prototype that could be optimized dynamically in training to adapt to a classification task. We apply our ProtoDiv scheme on seven baseline models, and then carry out a group of comparison experiments on two public WSI datasets. Experiments confirm our ProtoDiv could usually bring obvious performance improvements to WSI classification.
D-SVM over Networked Systems with Non-Ideal Linking Conditions
Authors: Mohammadreza Doostmohammadian, Alireza Aghasi, Houman Zarrabi
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2304.06667
Pdf link: https://arxiv.org/pdf/2304.06667
Abstract This paper considers distributed optimization algorithms, with application in binary classification via distributed support-vector-machines (D-SVM) over multi-agent networks subject to some link nonlinearities. The agents solve a consensus-constraint distributed optimization cooperatively via continuous-time dynamics, while the links are subject to strongly sign-preserving odd nonlinear conditions. Logarithmic quantization and clipping (saturation) are two examples of such nonlinearities. In contrast to existing literature that mostly considers ideal links and perfect information exchange over linear channels, we show how general sector-bounded models affect the convergence to the optimizer (i.e., the SVM classifier) over dynamic balanced directed networks. In general, any odd sector-bounded nonlinear mapping can be applied to our dynamics. The main challenge is to show that the proposed system dynamics always have one zero eigenvalue (associated with the consensus) and the other eigenvalues all have negative real parts. This is done by recalling arguments from matrix perturbation theory. Then, the solution is shown to converge to the agreement state under certain conditions. For example, the gradient tracking (GT) step size is tighter than the linear case by factors related to the upper/lower sector bounds. To the best of our knowledge, no existing work in distributed optimization and learning literature considers non-ideal link conditions.
Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms
Authors: Agnes Marjorie Nakiganda, Shahab Dehghan, Petros Aristidou
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.06674
Pdf link: https://arxiv.org/pdf/2304.06674
Abstract The integration of the frequency dynamics into Micro-Grid (MG) investment and operational planning problems is vital in ensuring the security of the system in the post-contingency states. However, the task of including transient security constraints in MG planning problems is non-trivial. This is due to the highly non-linear and non-convex nature of the analytical closed form of the frequency metrics (e.g., frequency nadir) and power flow constraints. To handle this issue, this paper presents two algorithms for decomposing the MG investment planning problem into multiple levels to enhance computational tractability and optimality. Furthermore, the sensitivity of the decisions made at each level is captured by corresponding dual cutting planes to model feasible secure regions. This, in turn, ensures both the optimal determination and placement of inertia services and accelerates the convergence of the proposed decomposition algorithms. The efficient and effective performance of the proposed algorithms is tested and verified on an 18-bus Low Voltage (LV) network and a 30-bus Medium Voltage (MV) network under various operating scenarios.
OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems
Authors: Jiachang Liu, Sam Rosen, Chudi Zhong, Cynthia Rudin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.06686
Pdf link: https://arxiv.org/pdf/2304.06686
Abstract We consider an important problem in scientific discovery, identifying sparse governing equations for nonlinear dynamical systems. This involves solving sparse ridge regression problems to provable optimality in order to determine which terms drive the underlying dynamics. We propose a fast algorithm, OKRidge, for sparse ridge regression, using a novel lower bound calculation involving, first, a saddle point formulation, and from there, either solving (i) a linear system or (ii) using an ADMM-based approach, where the proximal operators can be efficiently evaluated by solving another linear system and an isotonic regression problem. We also propose a method to warm-start our solver, which leverages a beam search. Experimentally, our methods attain provable optimality with run times that are orders of magnitude faster than those of the existing MIP formulations solved by the commercial solver Gurobi.
Representing Volumetric Videos as Dynamic MLP Maps
Authors: Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.06717
Pdf link: https://arxiv.org/pdf/2304.06717
Abstract This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. Recent advances in neural scene representations demonstrate their remarkable capability to model and render complex static scenes, but extending them to represent dynamic scenes is not straightforward due to their slow rendering speed or high storage cost. To solve this problem, our key idea is to represent the radiance field of each frame as a set of shallow MLP networks whose parameters are stored in 2D grids, called MLP maps, and dynamically predicted by a 2D CNN decoder shared by all frames. Representing 3D scenes with shallow MLPs significantly improves the rendering speed, while dynamically predicting MLP parameters with a shared 2D CNN instead of explicitly storing them leads to low storage cost. Experiments show that the proposed approach achieves state-of-the-art rendering quality on the NHR and ZJU-MoCap datasets, while being efficient for real-time rendering with a speed of 41.7 fps for $512 \times 512$ images on an RTX 3090 GPU. The code is available at https://zju3dv.github.io/mlp_maps/.

A-suozhang / GetArxivDaily

New submissions for Fri, 14 Apr 23 #32

Keyword: efficient

RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization

Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

Energy-guided Entropic Neural Optimal Transport

IoT trust and reputation: a survey and taxonomy

Label-Free Concept Bottleneck Models

AGI for Agriculture

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

SePEnTra: A secure and privacy-preserving energy trading mechanisms in transactive energy market

SURFSUP: Learning Fluid Simulation for Novel Surfaces

Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function

Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs

Cross-View Hierarchy Network for Stereo Image Super-Resolution

EWT: Efficient Wavelet-Transformer for Single Image Denoising

Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies

Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

Efficient Multimodal Fusion via Interactive Prompting

Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning

Universally Optimal Deterministic Broadcasting in the HYBRID Distributed Model

Continual Learning of Hand Gestures for Human-Robot Interaction

An Automotive Case Study on the Limits of Approximation for Object Detection

EF/CF: High Performance Smart Contract Fuzzing for Exploit Generation

DDT: Dual-branch Deformable Transformer for Image Denoising

ODAM: Gradient-based instance-specific visual explanations for object detection

IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function

An attack resilient policy on the tip pool for DAG-based distributed ledgers

Contact Models in Robotics: a Comparative Analysis

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

Fast And Automatic Floating Point Error Analysis With CHEF-FP

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification

Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods

Multiscale Finite Element Formulations for 2D/1D Problems

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Robustness Measures and Monitors for Time Window Temporal Logic

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Inertia-Aware Microgrid Investment Planning Using Tractable Decomposition Algorithms

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

Representing Volumetric Videos as Dynamic MLP Maps

Keyword: faster

Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

Beyond the Quadratic Time Barrier for Network Unreliability

Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation

OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

Keyword: mobile

Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires

Loosely Coupled Odometry, UWB Ranging, and Cooperative Spatial Detection for Relative Monte-Carlo Multi-Robot Localization

Gamifying Math Education using Object Detection

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

Towards Understanding the Benefits and Challenges of Demand Responsive Public Transit- A Case Study in the City of Charlotte, NC

IoT-Based Water Quality Assessment System for Industrial Waste WaterHealthcare Perspective

IoT-Based Remote Health Monitoring System Employing Smart Sensors for Asthma Patients during COVID-19 Pandemic

Keyword: pruning

Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

Keyword: voxel

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

Brain Structure Ages -- A new biomarker for multi-disease classification

Keyword: lidar

Survey on LiDAR Perception in Adverse Weather Conditions

An Automotive Case Study on the Limits of Approximation for Object Detection

RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception

Keyword: diffusion

Social Biases through the Text-to-Image Generation Lens

$E(3) \times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI

PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting