New submissions for Mon, 19 Jun 23

Keyword: efficient

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

Authors: Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09364
Pdf link: https://arxiv.org/pdf/2306.09364
Abstract Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).
Equitable Multi-task Learning
Authors: Jun Yuan, Rui Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09373
Pdf link: https://arxiv.org/pdf/2306.09373
Abstract Multi-task learning (MTL) has achieved great success in various research domains, such as CV, NLP and IR etc. Due to the complex and competing task correlation, na\"ive training all tasks may lead to inequitable learning, \textit{i.e.} some tasks are learned well while others are overlooked. Multi-task optimization (MTO) aims to improve all tasks at same time, but conventional methods often perform poor when tasks with large loss scale or gradient norm magnitude difference. To solve the issue, we in-depth investigate the equity problem for MTL and find that regularizing relative contribution of different tasks (\textit{i.e.} value of task-specific loss divides its raw gradient norm) in updating shared parameter can improve generalization performance of MTL. Based on our theoretical analysis, we propose a novel multi-task optimization method, named \textit{EMTL}, to achieve equitable MTL. Specifically, we efficiently add variance regularization to make different tasks' relative contribution closer. Extensive experiments have been conduct to evaluate EMTL, our method stably outperforms state-of-the-art methods on the public benchmark datasets of two different research domains. Furthermore, offline and online A/B test on multi-task recommendation are conducted too. EMTL improves multi-task recommendation significantly, demonstrating the superiority and practicability of our method in industrial landscape.
1st Solution Places for CVPR 2023 UG$^2$+ Challenge Track 2.2-Coded Target Restoration through Atmospheric Turbulence
Authors: Shengqi Xu, Shuning Cao, Haoyue Liu, Xueyao Xiao, Yi Chang, Luxin Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09379
Pdf link: https://arxiv.org/pdf/2306.09379
Abstract In this technical report, we briefly introduce the solution of our team VIELab-HUST for coded target restoration through atmospheric turbulence in CVPR 2023 UG$^2$+ Track 2.2. In this task, we propose an efficient multi-stage framework to restore a high quality image from distorted frames. Specifically, each distorted frame is initially aligned using image registration to suppress geometric distortion. We subsequently select the sharpest set of registered frames by employing a frame selection approach based on image sharpness, and average them to produce an image that is largely free of geometric distortion, albeit with blurriness. A learning-based deblurring method is then applied to remove the residual blur in the averaged image. Finally, post-processing techniques are utilized to further enhance the quality of the output image. Our framework is capable of handling different kinds of coded target dataset provided in the final testing phase, and ranked 1st on the final leaderboard. Our code will be available at https://github.com/xsqhust/Turbulence_Removal.
Understanding Parameter Sharing in Transformers
Authors: Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09380
Pdf link: https://arxiv.org/pdf/2306.09380
Abstract Parameter sharing has proven to be a parameter-efficient approach. Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth. In this paper, we study why this approach works from two perspectives. First, increasing model depth makes the model more complex, and we hypothesize that the reason is related to model complexity (referring to FLOPs). Secondly, since each shared parameter will participate in the network computation several times in forward propagation, its corresponding gradient will have a different range of values from the original model, which will affect the model convergence. Based on this, we hypothesize that training convergence may also be one of the reasons. Through further analysis, we show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity. Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner. Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
Sound Demixing Challenge 2023 -- Music Demixing Track Technical Report
Authors: Minseok Kim, Jun Hyung Lee
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.09382
Pdf link: https://arxiv.org/pdf/2306.09382
Abstract In this report, we present our award-winning solutions for the Music Demixing Track of Sound Demixing Challenge 2023. We focus on two methods designed for this challenge: a time-efficient source separation network that achieves state-of-the-art results on the MUSDB benchmark and a loss masking method for noise-robust source separation. Code for reproducing model training and final submissions is available at github.com/kuielab/sdx23.
Employing Multimodal Machine Learning for Stress Detection
Authors: Rahee Walambe, Pranav Nayak, Ashmit Bhardwaj, Ketan Kotecha
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.09385
Pdf link: https://arxiv.org/pdf/2306.09385
Abstract In the current age, human lifestyle has become more knowledge oriented leading to generation of sedentary employment. This has given rise to a number of health and mental disorders. Mental wellness is one of the most neglected but crucial aspects of today's world. Mental health issues can, both directly and indirectly, affect other sections of human physiology and impede an individual's day-to-day activities and performance. However, identifying the stress and finding the stress trend for an individual leading to serious mental ailments is challenging and involves multiple factors. Such identification can be achieved accurately by fusing these multiple modalities (due to various factors) arising from behavioral patterns. Certain techniques are identified in the literature for this purpose; however, very few machine learning-based methods are proposed for such multimodal fusion tasks. In this work, a multimodal AI-based framework is proposed to monitor a person's working behavior and stress levels. We propose a methodology for efficiently detecting stress due to workload by concatenating heterogeneous raw sensor data streams (e.g., face expressions, posture, heart rate, computer interaction). This data can be securely stored and analyzed to understand and discover personalized unique behavioral patterns leading to mental strain and fatigue. The contribution of this work is twofold; proposing a multimodal AI-based strategy for fusion to detect stress and its level and secondly identify a stress pattern over a period of time. We were able to achieve 96.09% accuracy on the test set in stress detection and classification. Further, we reduce the stress scale prediction model loss to 0.036 using these modalities. This work can prove important for the community at large, specifically those working sedentary jobs to monitor and identify stress levels, especially in current times of COVID-19.
A comprehensive review of 3D convolutional neural network-based classification techniques of diseased and defective crops using non-UAV-based hyperspectral images
Authors: Nooshin Noshiri, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2306.09418
Pdf link: https://arxiv.org/pdf/2306.09418
Abstract Hyperspectral imaging (HSI) is a non-destructive and contactless technology that provides valuable information about the structure and composition of an object. It can capture detailed information about the chemical and physical properties of agricultural crops. Due to its wide spectral range, compared with multispectral- or RGB-based imaging methods, HSI can be a more effective tool for monitoring crop health and productivity. With the advent of this imaging tool in agrotechnology, researchers can more accurately address issues related to the detection of diseased and defective crops in the agriculture industry. This allows to implement the most suitable and accurate farming solutions, such as irrigation and fertilization before crops enter a damaged and difficult-to-recover phase of growth in the field. While HSI provides valuable insights into the object under investigation, the limited number of HSI datasets for crop evaluation presently poses a bottleneck. Dealing with the curse of dimensionality presents another challenge due to the abundance of spectral and spatial information in each hyperspectral cube. State-of-the-art methods based on 1D- and 2D-CNNs struggle to efficiently extract spectral and spatial information. On the other hand, 3D-CNN-based models have shown significant promise in achieving better classification and detection results by leveraging spectral and spatial features simultaneously. Despite the apparent benefits of 3D-CNN-based models, their usage for classification purposes in this area of research has remained limited. This paper seeks to address this gap by reviewing 3D-CNN-based architectures and the typical deep learning pipeline, including preprocessing and visualization of results, for the classification of hyperspectral images of diseased and defective crops. Furthermore, we discuss open research areas and challenges when utilizing 3D-CNNs with HSI data.
Towards Sustainable Computing: Assessing the Carbon Footprint of Heterogeneous Systems
Authors: Vidya A. Chhabria, Chetan Choppali Sudarshan, Sarma Vrudhula, Sachin S. Sapatnekar
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2306.09434
Pdf link: https://arxiv.org/pdf/2306.09434
Abstract Decades of progress in energy-efficient and low-power design have successfully reduced the operational carbon footprint in the semiconductor industry. However, this has led to an increase in embodied emissions, encompassing carbon emissions arising from design, manufacturing, packaging, and other infrastructural activities. While existing research has developed tools to analyze embodied carbon at the computer architecture level for traditional monolithic systems, these tools do not apply to near-mainstream heterogeneous integration (HI) technologies. HI systems offer significant potential for sustainable computing by minimizing carbon emissions through two key strategies: ``reducing" computation by reusing pre-designed chiplet IP blocks and adopting hierarchical approaches to system design. The reuse of chiplets across multiple designs, even spanning multiple generations of integrated circuits (ICs), can substantially reduce embodied carbon emissions throughout the operational lifespan. This paper introduces a carbon analysis tool specifically designed to assess the potential of HI systems in facilitating greener VLSI system design and manufacturing approaches. The tool takes into account scaling, chiplet and packaging yields, design complexity, and even carbon overheads associated with advanced packaging techniques employed in heterogeneous systems. Experimental results demonstrate that HI can achieve a reduction of embodied carbon emissions up to 70\% compared to traditional large monolithic systems. These findings suggest that HI can pave the way for sustainable computing practices, contributing to a more environmentally conscious semiconductor industry.
Prevention of cyberattacks in WSN and packet drop by CI framework and information processing protocol using AI and Big Data
Authors: Shreyanth S
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09448
Pdf link: https://arxiv.org/pdf/2306.09448
Abstract As the reliance on wireless sensor networks (WSNs) rises in numerous sectors, cyberattack prevention and data transmission integrity become essential problems. This study provides a complete framework to handle these difficulties by integrating a cognitive intelligence (CI) framework, an information processing protocol, and sophisticated artificial intelligence (AI) and big data analytics approaches. The CI architecture is intended to improve WSN security by dynamically reacting to an evolving threat scenario. It employs artificial intelligence algorithms to continuously monitor and analyze network behavior, identifying and mitigating any intrusions in real time. Anomaly detection algorithms are also included in the framework to identify packet drop instances caused by attacks or network congestion. To support the CI architecture, an information processing protocol focusing on efficient and secure data transfer within the WSN is introduced. To protect data integrity and prevent unwanted access, this protocol includes encryption and authentication techniques. Furthermore, it enhances the routing process with the use of AI and big data approaches, providing reliable and timely packet delivery. Extensive simulations and tests are carried out to assess the efficiency of the suggested framework. The findings show that it is capable of detecting and preventing several forms of assaults, including as denial-of-service (DoS) attacks, node compromise, and data tampering. Furthermore, the framework is highly resilient to packet drop occurrences, which improves the WSN's overall reliability and performance
A flexible algorithm to offload DAG applications for edge computing
Authors: Gabriel F. C. de Queiroz, José F. de Rezende, Valmir C. Barbosa
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09458
Pdf link: https://arxiv.org/pdf/2306.09458
Abstract Multi-access Edge Computing (MEC) is an enabling technology to leverage new network applications, such as virtual/augmented reality, by providing faster task processing at the network edge. This is done by deploying servers closer to the end users to run the network applications. These applications are often intensive in terms of task processing, memory usage, and communication; thus mobile devices may take a long time or even not be able to run them efficiently. By transferring (offloading) the execution of these applications to the servers at the network edge, it is possible to achieve a lower completion time (makespan) and meet application requirements. However, offloading multiple entire applications to the edge server can overwhelm its hardware and communication channel, as well as underutilize the mobile devices' hardware. In this paper, network applications are modeled as Directed Acyclic Graphs (DAGs) and partitioned into tasks, and only part of these tasks are offloaded to the edge server. This is the DAG application partitioning and offloading problem, which is known to be NP-hard. To approximate its solution, this paper proposes the FlexDO algorithm. FlexDO combines a greedy phase with a permutation phase to find a set of offloading decisions, and then chooses the one that achieves the shortest makespan. FlexDO is compared with a proposal from the literature and two baseline decisions, considering realistic DAG applications extracted from the Alibaba Cluster Trace Program. Results show that FlexDO is consistently only 3.9% to 8.9% above the optimal makespan in all test scenarios, which include different levels of CPU availability, a multi-user case, and different communication channel transmission rates. FlexDO outperforms both baseline solutions by a wide margin, and is three times closer to the optimal makespan than its competitor.
R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction
Authors: Takeru Oba, Norimichi Ukita
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09483
Pdf link: https://arxiv.org/pdf/2306.09483
Abstract Image-based motion prediction is one of the essential techniques for robot manipulation. Among the various prediction models, we focus on diffusion models because they have achieved state-of-the-art performance in various applications. In image-based motion prediction, diffusion models stochastically predict contextually appropriate motion by gradually denoising random Gaussian noise based on the image context. While diffusion models are able to predict various motions by changing the random noise, they sometimes fail to predict a contextually appropriate motion based on the image because the random noise is sampled independently of the image context. To solve this problem, we propose R2-Diff. In R2-Diff, a motion retrieved from a dataset based on image similarity is fed into a diffusion model instead of random noise. Then, the retrieved motion is refined through the denoising process of the diffusion model. Since the retrieved motion is almost appropriate to the context, it becomes easier to predict contextually appropriate motion. However, traditional diffusion models are not optimized to refine the retrieved motion. Therefore, we propose the method of tuning the hyperparameters based on the distance of the nearest neighbor motion among the dataset to optimize the diffusion model for refinement. Furthermore, we propose an image-based retrieval method to retrieve the nearest neighbor motion in inference. Our proposed retrieval efficiently computes the similarity based on the image features along the motion trajectory. We demonstrate that R2-Diff accurately predicts appropriate motions and achieves high task success rates compared to recent state-of-the-art models in robot manipulation.
FedMultimodal: A Benchmark For Multimodal Federated Learning
Authors: Tiantian Feng, Digbalay Bose, Tuo Zhang, Rajat Hebbar, Anil Ramakrishna, Rahul Gupta, Mi Zhang, Salman Avestimehr, Shrikanth Narayanan
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09486
Pdf link: https://arxiv.org/pdf/2306.09486
Abstract Over the past few years, Federated Learning (FL) has become an emerging machine learning technique to tackle data privacy challenges through collaborative training. In the Federated Learning algorithm, the clients submit a locally trained model, and the server aggregates these parameters until convergence. Despite significant efforts that have been made to FL in fields like computer vision, audio, and natural language processing, the FL applications utilizing multimodal data streams remain largely unexplored. It is known that multimodal learning has broad real-world applications in emotion recognition, healthcare, multimedia, and social media, while user privacy persists as a critical concern. Specifically, there are no existing FL benchmarks targeting multimodal applications or related tasks. In order to facilitate the research in multimodal FL, we introduce FedMultimodal, the first FL benchmark for multimodal learning covering five representative multimodal applications from ten commonly used datasets with a total of eight unique modalities. FedMultimodal offers a systematic FL pipeline, enabling end-to-end modeling framework ranging from data partition and feature extraction to FL benchmark algorithms and model evaluation. Unlike existing FL benchmarks, FedMultimodal provides a standardized approach to assess the robustness of FL against three common data corruptions in real-life multimodal applications: missing modalities, missing labels, and erroneous labels. We hope that FedMultimodal can accelerate numerous future research directions, including designing multimodal FL algorithms toward extreme data heterogeneity, robustness multimodal FL, and efficient multimodal FL. The datasets and benchmark results can be accessed at: https://github.com/usc-sail/fed-multimodal.
Streamlining Input/Output Logics with Sequent Calculi
Authors: Agata Ciabattoni, Dmitry Rozplokhas
Subjects: Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2306.09496
Pdf link: https://arxiv.org/pdf/2306.09496
Abstract Input/Output (I/O) logic is a general framework for reasoning about conditional norms and/or causal relations. We streamline Bochman's causal I/O logics via proof-search-oriented sequent calculi. Our calculi establish a natural syntactic link between the derivability in these logics and in the original I/O logics. As a consequence of our results, we obtain new, simple semantics for all these logics, complexity bounds, embeddings into normal modal logics, and efficient deduction methods. Our work encompasses many scattered results and provides uniform solutions to various unresolved problems.
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation
Authors: Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2306.09501
Pdf link: https://arxiv.org/pdf/2306.09501
Abstract High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1\% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3\% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.
Granger-Causal Hierarchical Skill Discovery
Authors: Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09509
Pdf link: https://arxiv.org/pdf/2306.09509
Abstract Reinforcement Learning (RL) has shown promising results learning policies for complex tasks, but can often suffer from low sample efficiency and limited transfer. We introduce the Hierarchy of Interaction Skills (HIntS) algorithm, which uses learned interaction detectors to discover and train a hierarchy of skills that manipulate factors in factored environments. Inspired by Granger causality, these unsupervised detectors capture key events between factors to sample efficiently learn useful skills and transfer those skills to other related tasks -- tasks where many reinforcement learning techniques struggle. We evaluate HIntS on a robotic pushing task with obstacles -- a challenging domain where other RL and HRL methods fall short. The learned skills not only demonstrate transfer using variants of Breakout, a common RL benchmark, but also show 2-3x improvement in both sample efficiency and final performance compared to comparable RL baselines. Together, HIntS demonstrates a proof of concept for using Granger-causal relationships for skill discovery.
Block-State Transformer
Authors: Mahan Fathi, Jonathan Pilault, Pierre-Luc Bacon, Christopher Pal, Orhan Firat, Ross Goroshin
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09539
Pdf link: https://arxiv.org/pdf/2306.09539
Abstract State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
Authors: Yunfan Li, Yiran Wang, Yu Cheng, Lin Yang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09554
Pdf link: https://arxiv.org/pdf/2306.09554
Abstract Policy optimization methods are powerful algorithms in Reinforcement Learning (RL) for their flexibility to deal with policy parameterization and ability to handle model misspecification. However, these methods usually suffer from slow convergence rates and poor sample complexity. Hence it is important to design provably sample efficient algorithms for policy optimization. Yet, recent advances for this problems have only been successful in tabular and linear setting, whose benign structures cannot be generalized to non-linearly parameterized policies. In this paper, we address this problem by leveraging recent advances in value-based algorithms, including bounded eluder-dimension and online sensitivity sampling, to design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation. We show that, our algorithm obtains an $\varepsilon$-optimal policy with only $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^3})$ samples, where $\varepsilon$ is the suboptimality gap and $d$ is a complexity measure of the function class approximating the policy. This drastically improves previously best-known sample bound for policy optimization algorithms, $\widetilde{O}(\frac{\text{poly}(d)}{\varepsilon^8})$. Moreover, we empirically test our theory with deep neural nets to show the benefits of the theoretical inspiration.
MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Authors: Dequan Wang, Xiaosong Wang, Lilong Wang, Mengzhang Li, Qian Da, Xiaoqiang Liu, Xiangyu Gao, Jun Shen, Junjun He, Tian Shen, Qi Duan, Jie Zhao, Kang Li, Yu Qiao, Shaoting Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09579
Pdf link: https://arxiv.org/pdf/2306.09579
Abstract Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples, e.g., in-context learning. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks. In this paper, we aim at approaches adapting the foundation models for medical image classification and present a novel dataset and benchmark for the evaluation, i.e., examining the overall performance of accommodating the large-scale foundation models downstream on a set of diverse real-world clinical tasks. We collect five sets of medical imaging data from multiple institutes targeting a variety of real-world clinical tasks (22,349 images in total), i.e., thoracic diseases screening in X-rays, pathological lesion tissue screening, lesion detection in endoscopy images, neonatal jaundice evaluation, and diabetic retinopathy grading. Results of multiple baseline methods are demonstrated using the proposed dataset from both accuracy and cost-effective perspectives.
Learning CO$_2$ plume migration in faulted reservoirs with Graph Neural Networks
Authors: Xin Ju, François P. Hamon, Gege Wen, Rayan Kanfar, Mauricio Araya-Polo, Hamdi A. Tchelepi
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2306.09648
Pdf link: https://arxiv.org/pdf/2306.09648
Abstract Deep-learning-based surrogate models provide an efficient complement to numerical simulations for subsurface flow problems such as CO$_2$ geological storage. Accurately capturing the impact of faults on CO$_2$ plume migration remains a challenge for many existing deep learning surrogate models based on Convolutional Neural Networks (CNNs) or Neural Operators. We address this challenge with a graph-based neural model leveraging recent developments in the field of Graph Neural Networks (GNNs). Our model combines graph-based convolution Long-Short-Term-Memory (GConvLSTM) with a one-step GNN model, MeshGraphNet (MGN), to operate on complex unstructured meshes and limit temporal error accumulation. We demonstrate that our approach can accurately predict the temporal evolution of gas saturation and pore pressure in a synthetic reservoir with impermeable faults. Our results exhibit a better accuracy and a reduced temporal error accumulation compared to the standard MGN model. We also show the excellent generalizability of our algorithm to mesh configurations, boundary conditions, and heterogeneous permeability fields not included in the training set. This work highlights the potential of GNN-based methods to accurately and rapidly model subsurface flow with complex faults and fractures.
ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models
Authors: Jackie (Junrui)Yang, Karina Li, Daniel Wan Rosli, Shuning Zhang, Yuhan Zhang, Monica S. Lam, James A. Landay
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09649
Pdf link: https://arxiv.org/pdf/2306.09649
Abstract Multimodal interactions have been shown to be more flexible, efficient, and adaptable for diverse users and tasks than traditional graphical interfaces. However, existing multimodal development frameworks either do not handle the complexity and compositionality of multimodal commands well or require developers to write a substantial amount of code to support these multimodal interactions. In this paper, we present ReactGenie, a programming framework that uses a shared object-oriented state abstraction to support building complex multimodal mobile applications. Having different modalities share the same state abstraction allows developers using ReactGenie to seamlessly integrate and compose these modalities to deliver multimodal interaction. ReactGenie is a natural extension to the existing workflow of building a graphical app, like the workflow with React-Redux. Developers only have to add a few annotations and examples to indicate how natural language is mapped to the user-accessible functions in the program. ReactGenie automatically handles the complex problem of understanding natural language by generating a parser that leverages large language models. We evaluated the ReactGenie framework by using it to build three demo apps. We evaluated the accuracy of the language parser using elicited commands from crowd workers and evaluated the usability of the generated multimodal app with 16 participants. Our results show that ReactGenie can be used to build versatile multimodal applications with highly accurate language parsers, and the multimodal app can lower users' cognitive load and task completion time.
A New Low-Rank Learning Robust Quaternion Tensor Completion Method for Color Video Inpainting Problem and Fast Algorithms
Authors: Zhigang Jia, Jingfei Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.09652
Pdf link: https://arxiv.org/pdf/2306.09652
Abstract The color video inpainting problem is one of the most challenging problem in the modern imaging science. It aims to recover a color video from a small part of pixels that may contain noise. However, there are less of robust models that can simultaneously preserve the coupling of color channels and the evolution of color video frames. In this paper, we present a new robust quaternion tensor completion (RQTC) model to solve this challenging problem and derive the exact recovery theory. The main idea is to build a quaternion tensor optimization model to recover a low-rank quaternion tensor that represents the targeted color video and a sparse quaternion tensor that represents noise. This new model is very efficient to recover high dimensional data that satisfies the prior low-rank assumption. To solve the case without low-rank property, we introduce a new low-rank learning RQTC model, which rearranges similar patches classified by a quaternion learning method into smaller tensors satisfying the prior low-rank assumption. We also propose fast algorithms with global convergence guarantees. In numerical experiments, the proposed methods successfully recover color videos with eliminating color contamination and keeping the continuity of video scenery, and their solutions are of higher quality in terms of PSNR and SSIM values than the state-of-the-art algorithms.
Online Distillation for Pseudo-Relevance Feedback
Authors: Sean MacAvaney, Xi Wang
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09657
Pdf link: https://arxiv.org/pdf/2306.09657
Abstract Model distillation has emerged as a prominent technique to improve neural search models. To date, distillation taken an offline approach, wherein a new neural model is trained to predict relevance scores between arbitrary queries and documents. In this paper, we explore a departure from this offline distillation strategy by investigating whether a model for a specific query can be effectively distilled from neural re-ranking results (i.e., distilling in an online setting). Indeed, we find that a lexical model distilled online can reasonably replicate the re-ranking of a neural model. More importantly, these models can be used as queries that execute efficiently on indexes. This second retrieval stage can enrich the pool of documents for re-ranking by identifying documents that were missed in the first retrieval stage. Empirically, we show that this approach performs favourably when compared with established pseudo relevance feedback techniques, dense retrieval methods, and sparse-dense ensemble "hybrid" approaches.
A Smooth Binary Mechanism for Efficient Private Continual Observation
Authors: Joel Daniel Andersson, Rasmus Pagh
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.09666
Pdf link: https://arxiv.org/pdf/2306.09666
Abstract In privacy under continual observation we study how to release differentially private estimates based on a dataset that evolves over time. The problem of releasing private prefix sums of $x_1,x_2,x_3,\dots \in{0,1}$ (where the value of each $x_i$ is to be private) is particularly well-studied, and a generalized form is used in state-of-the-art methods for private stochastic gradient descent (SGD). The seminal binary mechanism privately releases the first $t$ prefix sums with noise of variance polylogarithmic in $t$. Recently, Henzinger et al. and Denisov et al. showed that it is possible to improve on the binary mechanism in two ways: The variance of the noise can be reduced by a (large) constant factor, and also made more even across time steps. However, their algorithms for generating the noise distribution are not as efficient as one would like in terms of computation time and (in particular) space. We address the efficiency problem by presenting a simple alternative to the binary mechanism in which 1) generating the noise takes constant average time per value, 2) the variance is reduced by a factor about 4 compared to the binary mechanism, and 3) the noise distribution at each step is identical. Empirically, a simple Python implementation of our approach outperforms the running time of the approach of Henzinger et al., as well as an attempt to improve their algorithm using high-performance algorithms for multiplication with Toeplitz matrices.
Karush-Kuhn-Tucker conditions to build efficient contractors; Application to TDoA localization
Authors: Luc Jaulin
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.09679
Pdf link: https://arxiv.org/pdf/2306.09679
Abstract This paper proposes an efficient contractor for the TDoA (Time Differential of Arrival) equation. The contractor is based on a minimal inclusion test which is built using the Karush-Kuhn-Tucker (KKT) conditions. An application related to the localization of sound sources using a TDoA technique is proposed.
Collapsed Inference for Bayesian Deep Learning
Authors: Zhe Zeng, Guy Van den Broeck
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.09686
Pdf link: https://arxiv.org/pdf/2306.09686
Abstract Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.
End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve
Authors: Limeng Qiao, Wenjie Ding, Xi Qiu, Chi Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09700
Pdf link: https://arxiv.org/pdf/2306.09700
Abstract Vectorized high-definition map (HD-map) construction, which focuses on the perception of centimeter-level environmental information, has attracted significant research interest in the autonomous driving community. Most existing approaches first obtain rasterized map with the segmentation-based pipeline and then conduct heavy post-processing for downstream-friendly vectorization. In this paper, by delving into parameterization-based methods, we pioneer a concise and elegant scheme that adopts unified piecewise Bezier curve. In order to vectorize changeful map elements end-to-end, we elaborate a simple yet effective architecture, named Piecewise Bezier HD-map Network (BeMapNet), which is formulated as a direct set prediction paradigm and postprocessing-free. Concretely, we first introduce a novel IPM-PE Align module to inject 3D geometry prior into BEV features through common position encoding in Transformer. Then a well-designed Piecewise Bezier Head is proposed to output the details of each map element, including the coordinate of control points and the segment number of curves. In addition, based on the progressively restoration of Bezier curve, we also present an efficient Point-Curve-Region Loss for supervising more robust and precise HD-map modeling. Extensive comparisons show that our method is remarkably superior to other existing SOTAs by 18.0 mAP at least.
Semi-Offline Reinforcement Learning for Optimized Text Generation
Authors: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09712
Pdf link: https://arxiv.org/pdf/2306.09712
Abstract In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.
Efficient Coflow Scheduling in Hybrid-Switched Data Center Networks
Authors: Xin Wang, Hong Shen, Hui Tian
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09713
Pdf link: https://arxiv.org/pdf/2306.09713
Abstract To improve the application-level communication performance, scheduling of coflows, a collection of parallel flows sharing the same objective, is prevalent in modern data center networks (DCNs). Meanwhile, a hybrid-switched DCN design combining optical circuit switches (OPS) and electrical packet switches (EPS) for transmitting high-volume traffic and low-volume traffic separately has received considerable research attention recently. Efficient scheduling of coflows on hybrid network links is crucial for reducing the overall communication time. However, because of the reconfiguration delay in the circuit switch due to the ultra-high transmission rate and the limitation of bandwidth in the packet switch, coflow scheduling becomes increasingly challenging. The existing coflow scheduling algorithms in hybrid-switched DCNs are all heuristic and provide no performance guarantees. In this work, we propose an approximation algorithm with the worst-case performance guarantee of 2+ \lambda?, where \lambda? is a factor related to system parameters and demand characteristics, for single coflow scheduling in hybridswitched DCN to minimize the coflow completion time (CCT). Extensive simulations based on Facebook data traces show that our algorithm outperforms the state-of-the-art schemes Solstice by 1.14? and Reco-Sin by 1.42? in terms of minimizing CCT.
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Authors: Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09729
Pdf link: https://arxiv.org/pdf/2306.09729
Abstract Pre-training & fine-tuning is a prevalent paradigm in computer vision (CV). Recently, parameter-efficient transfer learning (PETL) methods have shown promising performance in transferring knowledge from pre-trained models with only a few trainable parameters. Despite their success, the existing PETL methods in CV can be computationally expensive and require large amounts of memory and time cost during training, which limits low-resource users from conducting research and applications on large models. In this work, we propose Parameter, Memory, and Time Efficient Visual Adapter ($\mathrm{E^3VA}$) tuning to address this issue. We provide a gradient backpropagation highway for low-rank adapters which removes large gradient computations for the frozen pre-trained parameters, resulting in substantial savings of training memory and training time. Furthermore, we optimise the $\mathrm{E^3VA}$ structure for dense predictions tasks to promote model performance. Extensive experiments on COCO, ADE20K, and Pascal VOC benchmarks show that $\mathrm{E^3VA}$ can save up to 62.2% training memory and 26.2% training time on average, while achieving comparable performance to full fine-tuning and better performance than most PETL methods. Note that we can even train the Swin-Large-based Cascade Mask RCNN on GTX 1080Ti GPUs with less than 1.5% trainable parameters.
CroCoDai: A Stablecoin for Cross-Chain Commerce
Authors: Daniël Reijsbergen, Bretislav Hajek, Tien Tuan Anh Dinh, Jussi Keppo, Hank Korth, Anwitaman Datta
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.09754
Pdf link: https://arxiv.org/pdf/2306.09754
Abstract Decentralized Finance (DeFi), in which digital assets are exchanged without trusted intermediaries, has grown rapidly in value in recent years. The global DeFi ecosystem is fragmented into multiple blockchains, fueling the demand for cross-chain commerce. Existing approaches for cross-chain transactions, e.g., bridges and cross-chain deals, achieve atomicity by locking assets in escrow. However, locking up assets increases the financial risks for the participants, especially due to price fluctuations and the long latency of cross-chain transactions. Stablecoins, which are pegged to a non-volatile asset such as the US dollar, help mitigate the risk associated with price fluctuations. However, existing stablecoin designs are tied to individual blockchain platforms, and trusted parties or complex protocols are needed to exchange stablecoin tokens between blockchains. Our goal is to design a practical stablecoin for cross-chain commerce. Realizing this goal requires addressing two challenges. The first challenge is to support a large and growing number of blockchains efficiently. The second challenge is to be resilient to price fluctuations and blockchain platform failures. We present CroCoDai to address these challenges. We also present three prototype implementations of our stablecoin system, and show that it incurs small execution overhead.
Gradient is All You Need?
Authors: Konstantin Riedl, Timo Klock, Carina Geldhauser, Massimo Fornasier
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.09778
Pdf link: https://arxiv.org/pdf/2306.09778
Abstract In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Authors: Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, Xipeng Qiu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09782
Pdf link: https://arxiv.org/pdf/2306.09782
Abstract Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.
MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm
Authors: Massimo Coluzzi, Amos Brocco, Alessandro Antonucci, Tiziano Leidi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09783
Pdf link: https://arxiv.org/pdf/2306.09783
Abstract Consistent hashing is used in distributed systems and networking applications to spread data evenly and efficiently across a cluster of nodes. In this paper, we present MementoHash, a novel consistent hashing algorithm that eliminates known limitations of state-of-the-art algorithms while keeping optimal performance and minimal memory usage. We describe the algorithm in detail, provide a pseudo-code implementation, and formally establish its solid theoretical guarantees. To measure the efficacy of MementoHash, we compare its performance, in terms of memory usage and lookup time, to that of state-of-the-art algorithms, namely, AnchorHash, DxHash, and JumpHash. Unlike JumpHash, MementoHash can handle random failures. Moreover, MementoHash does not require fixing the overall capacity of the cluster (as AnchorHash and DxHash do), allowing it to scale indefinitely. The number of removed nodes affects the performance of all the considered algorithms. Therefore, we conduct experiments considering three different scenarios: stable (no removed nodes), one-shot removals (90% of the nodes removed at once), and incremental removals. We report experimental results that averaged a varying number of nodes from ten to one million. Results indicate that our algorithm shows optimal lookup performance and minimal memory usage in its best-case scenario. It behaves better than AnchorHash and DxHash in its average-case scenario and at least as well as those two algorithms in its worst-case scenario. However, the worst-case scenario for MementoHash occurs when more than 70% of the nodes fail, which describes a unlikely scenario. Therefore, MementoHash shows the best performance during the regular life cycle of a cluster.
Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes
Authors: Francesco Daghero, Alessio Burrello, Enrico Macii, Paolo Montuschi, Massimo Poncino, Daniele Jahier Pagliari
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09789
Pdf link: https://arxiv.org/pdf/2306.09789
Abstract With the increasing popularity of Internet of Things (IoT) devices, there is a growing need for energy-efficient Machine Learning (ML) models that can run on constrained edge nodes. Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs), are particularly suited for this task, given their relatively low complexity compared to other alternatives. However, their inference time and energy costs are still significant for edge hardware. Given that said costs grow linearly with the ensemble size, this paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input, to trade-off computational cost and accuracy. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code, and exploring several optimizations that account for the available parallelism and memory hierarchy. We extensively benchmark both static and dynamic RFs and GBTs on three state-of-the-art IoT-relevant datasets, using an 8-core ultra-lowpower System-on-Chip (SoC), GAP8, as the target platform. Thanks to the proposed early-stopping mechanisms, we achieve an energy reduction of up to 37.9% with respect to static GBTs (8.82 uJ vs 14.20 uJ per inference) and 41.7% with respect to static RFs (2.86 uJ vs 4.90 uJ per inference), without losing accuracy compared to the static model.
$\pi2\text{vec}$: Policy Representations with Successor Features
Authors: Gianluca Scarpellini, Ksenia Konyushkova, Claudio Fantacci, Tom Le Paine, Yutian Chen, Misha Denil
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09800
Pdf link: https://arxiv.org/pdf/2306.09800
Abstract This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.
Efficient Search and Detection of Relevant Plant Parts using Semantics-Aware Active Vision
Authors: Akshay K. Burusa, Joost Scholten, David Rapado Rincon, Xin Wang, Eldert J. van Henten, Gert Kootstra
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09801
Pdf link: https://arxiv.org/pdf/2306.09801
Abstract To automate harvesting and de-leafing of tomato plants using robots, it is important to search and detect the relevant plant parts, namely tomatoes, peduncles, and petioles. This is challenging due to high levels of occlusion in tomato greenhouses. Active vision is a promising approach which helps robots to deliberately plan camera viewpoints to overcome occlusion and improve perception accuracy. However, current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts, making them inefficient for targeted perception of specific plant parts. We propose a semantic active-vision strategy that uses semantic information to identify the relevant plant parts and prioritises them during view planning using an attention mechanism. We evaluated our strategy using 3D models of tomato plants with varying structural complexity, which closely represented occlusions in the real world. We used a simulated environment to gain insights into our strategy, while ensuring repeatability and statistical significance. At the end of ten viewpoints, our strategy was able to correctly detect 85.5% of the plant parts, about 4 parts more on average per plant compared to a volumetric active-vision strategy. Also, it detected 5 and 9 parts more compared to two predefined strategies and 11 parts more compared to a random strategy. It also performed reliably with a median of 88.9% correctly-detected objects per plant in 96 experiments. Our strategy was also robust to uncertainty in plant and plant-part position, plant complexity, and different viewpoint sampling strategies. We believe that our work could significantly improve the speed and robustness of automated harvesting and de-leafing in tomato crop production.
Sample-Efficient On-Policy Imitation Learning from Observations
Authors: João A. Cândido Ramos, Lionel Blondé, Naoya Takeishi, Alexandros Kalousis
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09805
Pdf link: https://arxiv.org/pdf/2306.09805
Abstract Imitation learning from demonstrations (ILD) aims to alleviate numerous shortcomings of reinforcement learning through the use of demonstrations. However, in most real-world applications, expert action guidance is absent, making the use of ILD impossible. Instead, we consider imitation learning from observations (ILO), where no expert actions are provided, making it a significantly more challenging problem to address. Existing methods often employ on-policy learning, which is known to be sample-costly. This paper presents SEILO, a novel sample-efficient on-policy algorithm for ILO, that combines standard adversarial imitation learning with inverse dynamics modeling. This approach enables the agent to receive feedback from both the adversarial procedure and a behavior cloning loss. We empirically demonstrate that our proposed algorithm requires fewer interactions with the environment to achieve expert performance compared to other state-of-the-art on-policy ILO and ILD methods.
High-order finite-volume integration schemes for subsonic magnetohydrodynamics
Authors: Jean-Mathieu Teissier, Wolf-Christian Müller
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn); Plasma Physics (physics.plasm-ph); Space Physics (physics.space-ph)
Arxiv link: https://arxiv.org/abs/2306.09856
Pdf link: https://arxiv.org/pdf/2306.09856
Abstract We present an efficient dimension-by-dimension finite-volume method which solves the adiabatic magnetohydrodynamics equations at high discretization order, using the constrained-transport approach on Cartesian grids. Results are presented up to tenth order of accuracy. This method requires only one reconstructed value per face for each computational cell. A passage through high-order point values leads to a modest growth of computational cost with increasing discretization order. At a given resolution, these high-order schemes present significantly less numerical dissipation than commonly employed lower-order approaches. Thus, results of comparable accuracy are achievable at a substantially coarser resolution, yielding overall performance gains. We also present a way to include physical dissipative terms: viscosity, magnetic diffusivity and cooling functions, respecting the finite-volume and constrained-transport frameworks.
Direct parametrisation of invariant manifolds for generic non-autonomous systems including superharmonic resonances
Authors: Alessandra Vizzaccaro, Giorgio Gobat, Attilio Frangi, Cyril Touzé
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.09860
Pdf link: https://arxiv.org/pdf/2306.09860
Abstract The direct parametrisation method for invariant manifold is a model-order reduction technique that can be directly applied to finite element problems in order to derive efficient and converged reduced-order models (ROMs) for non-linear structures. In the field of nonlinear vibrations, it has already been applied to autonomous and non-autonomous problems in order to propose ROMs that can compute backbones and frequency-response curves of structures with geometric nonlinearity. While previous developments used a first-order development in order to cope with the non-autonomous term, this assumption is here relaxed by proposing a completely different treatment. The key idea is to enlarge the dimension of the dynamical system to make it autonomous and treat the added coordinates related to the forcing as already being written with normal coordinates. The parametrisation method is derived with this starting assumption and, as a key consequence, the resonance relationships appearing through the homological equations involve multiple occurrences of the forcing frequency, showing that with this new development, one is able to compute ROMs for superharmonic resonance. The method is implemented and validated on academic test cases involving beams and arches. It is numerically demonstrated that the method generates efficient ROMs for 3:1 and 2:1 superharmonic resonances, as well as converged results for problems where the first-order truncation on the non-autonomous terms used in previous developments showed a clear limitation.
An Efficient Algorithm for Power Dominating Set
Authors: Thomas Bläsius, Max Göttlicher
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2306.09870
Pdf link: https://arxiv.org/pdf/2306.09870
Abstract The problem Power Dominating Set (PDS) is motivated by the placement of phasor measurement units to monitor electrical networks. It asks for a minimum set of vertices in a graph that observes all remaining vertices by exhaustively applying two observation rules. Our contribution is twofold. First, we determine the parameterized complexity of PDS by proving it is $W[P]$-complete when parameterized with respect to the solution size. We note that it was only known to be $W[2]$-hard before. Our second and main contribution is a new algorithm for PDS that efficiently solves practical instances. Our algorithm consists of two complementary parts. The first is a set of reduction rules for PDS that can also be used in conjunction with previously existing algorithms. The second is an algorithm for solving the remaining kernel based on the implicit hitting set approach. Our evaluation on a set of power grid instances from the literature shows that our solver outperforms previous state-of-the-art solvers for PDS by more than one order of magnitude on average. Furthermore, our algorithm can solve previously unsolved instances of continental scale within a few minutes.
Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection
Authors: Bartosz Grabowski, Maciej Ziaja, Michal Kawulok, Piotr Bosowski, Nicolas Longépé, Bertrand Le Saux, Jakub Nalepa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09886
Pdf link: https://arxiv.org/pdf/2306.09886
Abstract Cloud detection is a pivotal satellite image pre-processing step that can be performed both on the ground and on board a satellite to tag useful images. In the latter case, it can reduce the amount of data to downlink by pruning the cloudy areas, or to make a satellite more autonomous through data-driven acquisition re-scheduling. We approach this task with nnU-Nets, a self-reconfigurable framework able to perform meta-learning of a segmentation network over various datasets. Unfortunately, such models are commonly memory-inefficient due to their (very) large architectures. To benefit from them in on-board processing, we compress nnU-Nets with knowledge distillation into much smaller and compact U-Nets. Our experiments, performed over Sentinel-2 and Landsat-8 images revealed that nnU-Nets deliver state-of-the-art performance without any manual design. Our approach was ranked within the top 7% best solutions (across 847 teams) in the On Cloud N: Cloud Cover Detection Challenge, where we reached the Jaccard index of 0.882 over more than 10k unseen Sentinel-2 images (the winners obtained 0.897, the baseline U-Net with the ResNet-34 backbone: 0.817, and the classic Sentinel-2 image thresholding: 0.652). Finally, we showed that knowledge distillation enables to elaborate dramatically smaller (almost 280x) U-Nets when compared to nnU-Nets while still maintaining their segmentation capabilities.
LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning
Authors: Jifan Zhang, Yifang Chen, Gregory Canal, Stephen Mussmann, Yinglun Zhu, Simon Shaolei Du, Kevin Jamieson, Robert D Nowak
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09910
Pdf link: https://arxiv.org/pdf/2306.09910
Abstract Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.
Model-based versus model-free feeding control and water quality monitoring for fish growth tracking in aquaculture systems
Authors: Fahad Aljehani, Ibrahima N'Doye, Taous-Meriem Laleg-Kirati
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.09915
Pdf link: https://arxiv.org/pdf/2306.09915
Abstract The high concentration level of the environmental factors, such as a high ammonia concentration and pH level, affect the water quality, affecting fish's survival and mass death. Therefore, there is a critical need to develop control strategies to determine optimal, efficient, and reliable feeding and water quality monitoring processes. In this paper, we revisit the representative fish growth model describing the total biomass change by incorporating the fish population density and mortality. Since the measurement data of the total biomass and population from the aquaculture systems are limited and difficult to obtain, we validate the new dynamic population model with the individual fish growth data for tracking control purposes. We specifically focus on relative feeding as a manipulated variable to design traditional and optimal control to track the desired weight reference within the sub-optimal temperature and dissolved oxygen profiles under different levels of unionized ammonia exposure. Then, we propose a Q-learning approach that learns an optimal feeding control policy from the simulated data of the fish growth weight trajectories while managing the ammonia effects. The proposed Q-learning feeding control prevents fish mortality and achieves good tracking errors of the fish weight under the different levels of unionized ammonia. However, it maintains a relative food consumption that potentially underfeeds the fish. Finally, we propose an optimal algorithm that optimizes the feeding and water quality of the dynamic fish population growth process. We also show that the model predictive control decreases fish mortality and reduces food consumption in all different cases of unionized ammonia exposure.
Feeding control and water quality monitoring in aquaculture systems: Opportunities and challenges
Authors: Fahad Aljehani, Ibrahima N'Doye, Taous-Meriem Laleg-Kirati
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09920
Pdf link: https://arxiv.org/pdf/2306.09920
Abstract Aquaculture systems can benefit from the recent development of advanced control strategies to reduce operating costs and fish loss and increase growth production efficiency, resulting in fish welfare and health. Monitoring the water quality and controlling feeding are fundamental elements of balancing fish productivity and shaping the fish growth process. Currently, most fish-feeding processes are conducted manually in different phases and rely on time-consuming and challenging artificial discrimination. The feeding control approach influences fish growth and breeding through the feed conversion rate; hence, controlling these feeding parameters is crucial for enhancing fish welfare and minimizing general fishery costs. The high concentration of environmental factors, such as a high ammonia concentration and pH, affect the water quality and fish survival. Therefore, there is a critical need to develop control strategies to determine optimal, efficient, and reliable feeding processes and monitor water quality. This paper reviews the main control design techniques for fish growth in aquaculture systems, namely algorithms that optimize the feeding and water quality of a dynamic fish growth process. Specifically, we review model-based control approaches and model-free reinforcement learning strategies to optimize the growth and survival of the fish or track a desired reference live-weight growth trajectory. The model-free framework uses an approximate fish growth dynamic model and does not satisfy constraints. We discuss how model-based approaches can support a reinforcement learning framework to efficiently handle constraint satisfaction and find better trajectories and policies from value-based reinforcement learning.
A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development
Authors: Seyed Jalaleddin Mousavirad, Luís A. Alexandre
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09931
Pdf link: https://arxiv.org/pdf/2306.09931
Abstract Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment. Despite the existence of several energy-efficient programming practices in Android, the leading mobile ecosystem, machine learning-based energy prediction algorithms for mobile App development have yet to be reported. Therefore, this paper proposes a histogram-based gradient boosting classification machine (HGBC), boosted by a metaheuristic approach, for energy prediction in mobile App development. Our metaheuristic approach is responsible for two issues. First, it finds redundant and irrelevant features without any noticeable change in performance. Second, it performs a hyper-parameter tuning for the HGBC algorithm. Since our proposed metaheuristic approach is algorithm-independent, we selected 12 algorithms for the search strategy to find the optimal search algorithm. Our finding shows that a success-history-based parameter adaption for differential evolution with linear population size (L-SHADE) offers the best performance. It can improve performance and decrease the number of features effectively. Our extensive set of experiments clearly shows that our proposed approach can provide significant results for energy consumption prediction.
Drag-guided diffusion models for vehicle image generation
Authors: Nikos Arechiga, Frank Permenter, Binyang Song, Chenyang Yuan
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.09935
Pdf link: https://arxiv.org/pdf/2306.09935
Abstract Denoising diffusion models trained at web-scale have revolutionized image generation. The application of these tools to engineering design is an intriguing possibility, but is currently limited by their inability to parse and enforce concrete engineering constraints. In this paper, we take a step towards this goal by proposing physics-based guidance, which enables optimization of a performance metric (as predicted by a surrogate model) during the generation process. As a proof-of-concept, we add drag guidance to Stable Diffusion, which allows this tool to generate images of novel vehicles while simultaneously minimizing their predicted drag coefficients.
Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs
Authors: Steinar Laenen, Bogdan-Adrian Manghiuc, He Sun
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09950
Pdf link: https://arxiv.org/pdf/2306.09950
Abstract This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function. For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$, and return an $O(1)$-approximate HC tree with respect to Dasgupta's cost function. We compare the performance of our algorithm against the previous state-of-the-art on synthetic and real-world datasets and show that our designed algorithm produces comparable or better HC trees with much lower running time.
Group Orthogonalization Regularization For Vision Models Adaptation and Robustness
Authors: Yoav Kurtz, Noga Bar, Raja Giryes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.10001
Pdf link: https://arxiv.org/pdf/2306.10001
Abstract As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness
Authors: Eric Zelikman, Qian Huang, Percy Liang, Nick Haber, Noah D. Goodman
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.10015
Pdf link: https://arxiv.org/pdf/2306.10015
Abstract Language model training in distributed settings is limited by the communication cost of gradient exchanges. In this short note, we extend recent work from Malladi et al. (2023), using shared randomness to perform distributed fine-tuning with low bandwidth. The method is a natural decentralized extension of memory-efficient Simultaneous Perturbation Stochastic Approximation (SPSA). Each iteration, each machine seeds a Random Number Generator (RNG) to perform local reproducible perturbations on model weights and calculate and exchange scalar projected gradients, which are then used to update each model. By using a (machine, sample) identifier as the random seed, each model can regenerate one another's perturbations. As machines only exchange single-byte projected gradients, this is highly communication efficient. There are also potential privacy benefits, as projected gradients may be calculated on different training data, and models never access the other's data. Our approach not only drastically reduces communication bandwidth requirements but also accommodates dynamic addition or removal of machines during the training process and retains the memory-efficient and inference-only advantages of recent work. We perform proof-of-concept experiments to demonstrate the potential usefulness of this method, building off of rich literature on distributed optimization and memory-efficient training.
Keyword: faster

A flexible algorithm to offload DAG applications for edge computing
Authors: Gabriel F. C. de Queiroz, José F. de Rezende, Valmir C. Barbosa
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09458
Pdf link: https://arxiv.org/pdf/2306.09458
Abstract Multi-access Edge Computing (MEC) is an enabling technology to leverage new network applications, such as virtual/augmented reality, by providing faster task processing at the network edge. This is done by deploying servers closer to the end users to run the network applications. These applications are often intensive in terms of task processing, memory usage, and communication; thus mobile devices may take a long time or even not be able to run them efficiently. By transferring (offloading) the execution of these applications to the servers at the network edge, it is possible to achieve a lower completion time (makespan) and meet application requirements. However, offloading multiple entire applications to the edge server can overwhelm its hardware and communication channel, as well as underutilize the mobile devices' hardware. In this paper, network applications are modeled as Directed Acyclic Graphs (DAGs) and partitioned into tasks, and only part of these tasks are offloaded to the edge server. This is the DAG application partitioning and offloading problem, which is known to be NP-hard. To approximate its solution, this paper proposes the FlexDO algorithm. FlexDO combines a greedy phase with a permutation phase to find a set of offloading decisions, and then chooses the one that achieves the shortest makespan. FlexDO is compared with a proposal from the literature and two baseline decisions, considering realistic DAG applications extracted from the Alibaba Cluster Trace Program. Results show that FlexDO is consistently only 3.9% to 8.9% above the optimal makespan in all test scenarios, which include different levels of CPU availability, a multi-user case, and different communication channel transmission rates. FlexDO outperforms both baseline solutions by a wide margin, and is three times closer to the optimal makespan than its competitor.
Simplified Temporal Consistency Reinforcement Learning
Authors: Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, Joni Pajarinen
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09466
Pdf link: https://arxiv.org/pdf/2306.09466
Abstract Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
Structural Restricted Boltzmann Machine for image denoising and classification
Authors: Bidaurrazaga Arkaitz, Pérez Aritz, Santana Roberto
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.09628
Pdf link: https://arxiv.org/pdf/2306.09628
Abstract Restricted Boltzmann Machines are generative models that consist of a layer of hidden variables connected to another layer of visible units, and they are used to model the distribution over visible variables. In order to gain a higher representability power, many hidden units are commonly used, which, in combination with a large number of visible units, leads to a high number of trainable parameters. In this work we introduce the Structural Restricted Boltzmann Machine model, which taking advantage of the structure of the data in hand, constrains connections of hidden units to subsets of visible units in order to reduce significantly the number of trainable parameters, without compromising performance. As a possible area of application, we focus on image modelling. Based on the nature of the images, the structure of the connections is given in terms of spatial neighbourhoods over the pixels of the image that constitute the visible variables of the model. We conduct extensive experiments on various image domains. Image denoising is evaluated with corrupted images from the MNIST dataset. The generative power of our models is compared to vanilla RBMs, as well as their classification performance, which is assessed with five different image domains. Results show that our proposed model has a faster and more stable training, while also obtaining better results compared to an RBM with no constrained connections between its visible and hidden units.
Keyword: mobile

A flexible algorithm to offload DAG applications for edge computing
Authors: Gabriel F. C. de Queiroz, José F. de Rezende, Valmir C. Barbosa
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.09458
Pdf link: https://arxiv.org/pdf/2306.09458
Abstract Multi-access Edge Computing (MEC) is an enabling technology to leverage new network applications, such as virtual/augmented reality, by providing faster task processing at the network edge. This is done by deploying servers closer to the end users to run the network applications. These applications are often intensive in terms of task processing, memory usage, and communication; thus mobile devices may take a long time or even not be able to run them efficiently. By transferring (offloading) the execution of these applications to the servers at the network edge, it is possible to achieve a lower completion time (makespan) and meet application requirements. However, offloading multiple entire applications to the edge server can overwhelm its hardware and communication channel, as well as underutilize the mobile devices' hardware. In this paper, network applications are modeled as Directed Acyclic Graphs (DAGs) and partitioned into tasks, and only part of these tasks are offloaded to the edge server. This is the DAG application partitioning and offloading problem, which is known to be NP-hard. To approximate its solution, this paper proposes the FlexDO algorithm. FlexDO combines a greedy phase with a permutation phase to find a set of offloading decisions, and then chooses the one that achieves the shortest makespan. FlexDO is compared with a proposal from the literature and two baseline decisions, considering realistic DAG applications extracted from the Alibaba Cluster Trace Program. Results show that FlexDO is consistently only 3.9% to 8.9% above the optimal makespan in all test scenarios, which include different levels of CPU availability, a multi-user case, and different communication channel transmission rates. FlexDO outperforms both baseline solutions by a wide margin, and is three times closer to the optimal makespan than its competitor.
Privacy Guarantees for Personal Mobility Data in Humanitarian Response
Authors: Nitin Kohli, Emily Aiken, Joshua Blumenstock
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.09471
Pdf link: https://arxiv.org/pdf/2306.09471
Abstract Personal mobility data from mobile phones and other sensors are increasingly used to inform policymaking during pandemics, natural disasters, and other humanitarian crises. However, even aggregated mobility traces can reveal private information about individual movements to potentially malicious actors. This paper develops and tests an approach for releasing private mobility data, which provides formal guarantees over the privacy of the underlying subjects. Specifically, we (1) introduce an algorithm for constructing differentially private mobility matrices, and derive privacy and accuracy bounds on this algorithm; (2) use real-world data from mobile phone operators in Afghanistan and Rwanda to show how this algorithm can enable the use of private mobility data in two high-stakes policy decisions: pandemic response and the distribution of humanitarian aid; and (3) discuss practical decisions that need to be made when implementing this approach, such as how to optimally balance privacy and accuracy. Taken together, these results can help enable the responsible use of private mobility data in humanitarian response.
Opportunistic Transmission of Distributed Learning Models in Mobile UAVs
Authors: Jingxin Li, Xiaolan Liu, Toktam Mahmoodi
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.09484
Pdf link: https://arxiv.org/pdf/2306.09484
Abstract In this paper, we propose an opportunistic scheme for the transmission of model updates from Federated Learning (FL) clients to the server, where clients are wireless mobile users. This proposal aims to opportunistically take advantage of the proximity of users to the base station or the general condition of the wireless transmission channel, rather than traditional synchronous transmission. In this scheme, during the training, intermediate model parameters are uploaded to the server, opportunistically and based on the wireless channel condition. Then, the proactively-transmitted model updates are used for the global aggregation if the final local model updates are delayed. We apply this novel model transmission scheme to one of our previous work, which is a hybrid split and federated learning (HSFL) framework for UAVs. Simulation results confirm the superiority of using proactive transmission over the conventional asynchronous aggregation scheme for the staled model by obtaining higher accuracy and more stable training performance. Test accuracy increases by up to 13.47% with just one round of extra transmission.
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation
Authors: Harel Biggie, Ajay Narasimha Mopidevi, Dusty Woods, Christoffer Heckman
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09523
Pdf link: https://arxiv.org/pdf/2306.09523
Abstract Humans have the remarkable ability to navigate through unfamiliar environments by solely relying on our prior knowledge and descriptions of the environment. For robots to perform the same type of navigation, they need to be able to associate natural language descriptions with their associated physical environment with a limited amount of prior knowledge. Recently, Large Language Models (LLMs) have been able to reason over billions of parameters and utilize them in multi-modal chat-based natural language responses. However, LLMs lack real-world awareness and their outputs are not always predictable. In this work, we develop NavCom, a low-bandwidth framework that solves this lack of real-world generalization by creating an intermediate layer between an LLM and a robot navigation framework in the form of Python code. Our intermediate shoehorns the vast prior knowledge inherent in an LLM model into a series of input and output API instructions that a mobile robot can understand. We evaluate our method across four different environments and command classes on a mobile robot and highlight our NavCom's ability to interpret contextual commands.
PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition
Authors: Jia Le Ngwe, Kian Ming Lim, Chin Poo Lee, Thian Song Ong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09626
Pdf link: https://arxiv.org/pdf/2306.09626
Abstract Facial Expression Recognition (FER) is a machine learning problem that deals with recognizing human facial expressions. While existing work has achieved performance improvements in recent years, FER in the wild and under challenging conditions remains a challenge. In this paper, a lightweight patch and attention network based on MobileNetV1, referred to as PAtt-Lite, is proposed to improve FER performance under challenging conditions. A truncated ImageNet-pre-trained MobileNetV1 is utilized as the backbone feature extractor of the proposed method. In place of the truncated layers is a patch extraction block that is proposed for extracting significant local facial features to enhance the representation from MobileNetV1, especially under challenging conditions. An attention classifier is also proposed to improve the learning of these patched feature maps from the extremely lightweight feature extractor. The experimental results on public benchmark databases proved the effectiveness of the proposed method. PAtt-Lite achieved state-of-the-art results on CK+, RAF-DB, FER2013, FERPlus, and the challenging conditions subsets for RAF-DB and FERPlus. The source code for the proposed method will be available at https://github.com/JLREx/PAtt-Lite.
DeepMPR: Enhancing Opportunistic Routing in Wireless Networks through Multi-Agent Deep Reinforcement Learning
Authors: Saeed Kaviani, Bo Ryu, Ejaz Ahmed, Deokseong Kim, Jae Kim, Carrie Spiker, Blake Harnden
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09637
Pdf link: https://arxiv.org/pdf/2306.09637
Abstract Opportunistic routing relies on the broadcast capability of wireless networks. It brings higher reliability and robustness in highly dynamic and/or severe environments such as mobile or vehicular ad-hoc networks (MANETs/VANETs). To reduce the cost of broadcast, multicast routing schemes use the connected dominating set (CDS) or multi-point relaying (MPR) set to decrease the network overhead and hence, their selection algorithms are critical. Common MPR selection algorithms are heuristic, rely on coordination between nodes, need high computational power for large networks, and are difficult to tune for network uncertainties. In this paper, we use multi-agent deep reinforcement learning to design a novel MPR multicast routing technique, DeepMPR, which is outperforming the OLSR MPR selection algorithm while it does not require MPR announcement messages from the neighbors. Our evaluation results demonstrate the performance gains of our trained DeepMPR multicast forwarding policy compared to other popular techniques.
ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models
Authors: Jackie (Junrui)Yang, Karina Li, Daniel Wan Rosli, Shuning Zhang, Yuhan Zhang, Monica S. Lam, James A. Landay
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09649
Pdf link: https://arxiv.org/pdf/2306.09649
Abstract Multimodal interactions have been shown to be more flexible, efficient, and adaptable for diverse users and tasks than traditional graphical interfaces. However, existing multimodal development frameworks either do not handle the complexity and compositionality of multimodal commands well or require developers to write a substantial amount of code to support these multimodal interactions. In this paper, we present ReactGenie, a programming framework that uses a shared object-oriented state abstraction to support building complex multimodal mobile applications. Having different modalities share the same state abstraction allows developers using ReactGenie to seamlessly integrate and compose these modalities to deliver multimodal interaction. ReactGenie is a natural extension to the existing workflow of building a graphical app, like the workflow with React-Redux. Developers only have to add a few annotations and examples to indicate how natural language is mapped to the user-accessible functions in the program. ReactGenie automatically handles the complex problem of understanding natural language by generating a parser that leverages large language models. We evaluated the ReactGenie framework by using it to build three demo apps. We evaluated the accuracy of the language parser using elicited commands from crowd workers and evaluated the usability of the generated multimodal app with 16 participants. Our results show that ReactGenie can be used to build versatile multimodal applications with highly accurate language parsers, and the multimodal app can lower users' cognitive load and task completion time.
Lost and not Found: An Investigation of Recovery Methods for Multi-Factor Authentication
Authors: Sabrina Amft, Sandra Höltervennhoff, Nicolas Huaman, Alexander Krause, Lucy Simko, Yasemin Acar, Sascha Fahl
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.09708
Pdf link: https://arxiv.org/pdf/2306.09708
Abstract Multi-Factor Authentication is intended to strengthen the security of password-based authentication by adding another factor, such as hardware tokens or one-time passwords using mobile apps. However, this increased authentication security comes with potential drawbacks that can lead to account and asset loss. If users lose access to their additional authentication factors for any reason, they will be locked out of their accounts. Consequently, services that provide Multi-Factor Authentication should deploy procedures to allow their users to recover from losing access to their additional factor that are both secure and easy-to-use. To the best of our knowledge, we are the first to first-hand investigate the security and user experience of deployed Multi-Factor Authentication recovery procedures. We first evaluate the official help and support pages of 1,303 websites that provide Multi-Factor Authentication and collect documented information about their recovery procedures. Second, we select a subset of 71 websites, create accounts, set up Multi-Factor Authentication, and perform an in-depth investigation of their recovery procedure security and user experience. We find that many websites deploy insecure Multi-Factor Authentication recovery procedures and allowed us to circumvent and disable Multi-Factor Authentication when having access to the accounts' associated email addresses. Furthermore, we commonly observed discrepancies between our in-depth analysis and the official help and support pages, implying that information meant to aid users is often either incorrect or outdated.
CANDID: Correspondence AligNment for Deep-burst Image Denoising
Authors: Arijit Mallick, Raphael Braun, Hendrik PA Lensch
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09887
Pdf link: https://arxiv.org/pdf/2306.09887
Abstract With the advent of mobile phone photography and point-and-shoot cameras, deep-burst imaging is widely used for a number of photographic effects such as depth of field, super-resolution, motion deblurring, and image denoising. In this work, we propose to solve the problem of deep-burst image denoising by including an optical flow-based correspondence estimation module which aligns all the input burst images with respect to a reference frame. In order to deal with varying noise levels the individual burst images are pre-filtered with different settings. Exploiting the established correspondences one network block predicts a pixel-wise spatially-varying filter kernel to smooth each image in the original and prefiltered bursts before fusing all images to generate the final denoised output. The resulting pipeline achieves state-of-the-art results by combining all available information provided by the burst.
A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development
Authors: Seyed Jalaleddin Mousavirad, Luís A. Alexandre
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09931
Pdf link: https://arxiv.org/pdf/2306.09931
Abstract Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment. Despite the existence of several energy-efficient programming practices in Android, the leading mobile ecosystem, machine learning-based energy prediction algorithms for mobile App development have yet to be reported. Therefore, this paper proposes a histogram-based gradient boosting classification machine (HGBC), boosted by a metaheuristic approach, for energy prediction in mobile App development. Our metaheuristic approach is responsible for two issues. First, it finds redundant and irrelevant features without any noticeable change in performance. Second, it performs a hyper-parameter tuning for the HGBC algorithm. Since our proposed metaheuristic approach is algorithm-independent, we selected 12 algorithms for the search strategy to find the optimal search algorithm. Our finding shows that a success-history-based parameter adaption for differential evolution with linear population size (L-SHADE) offers the best performance. It can improve performance and decrease the number of features effectively. Our extensive set of experiments clearly shows that our proposed approach can provide significant results for energy consumption prediction.
Keyword: pruning

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network
Authors: Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2306.09552
Pdf link: https://arxiv.org/pdf/2306.09552
Abstract EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI chips. In retrospect, we review the background of this project, summarize the pros and cons, and discuss new opportunities where pruning, sparsity, and low precision can accelerate emerging deep learning workloads.
Representation and decomposition of functions in DAG-DNNs and structural network pruning
Authors: Wen-Liang Hwang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09707
Pdf link: https://arxiv.org/pdf/2306.09707
Abstract The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the input node and the node of interest. In the current study, we demonstrate that DAG-DNNs can be used to derive all functions defined on various sub-architectures of the DNN. We also demonstrate that the functions defined in a DAG-DNN can be derived via a sequence of lower-triangular matrices, each of which provides the transition of functions defined in sub-graphs up to nodes at a specified level. The lifting structure associated with lower-triangular matrices makes it possible to perform the structural pruning of a network in a systematic manner. The fact that decomposition is universally applicable to all DNNs means that network pruning could theoretically be applied to any DNN, regardless of the underlying architecture. We demonstrate that it is possible to obtain the winning ticket (sub-network and initialization) for a weak version of the lottery ticket hypothesis, based on the fact that the sub-network with initialization can achieve training performance on par with that of the original network using the same number of iterations or fewer.
Transferability of Winning Lottery Tickets in Neural Network Differential Equation Solvers
Authors: Edward Prideaux-Ghee
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09863
Pdf link: https://arxiv.org/pdf/2306.09863
Abstract Recent work has shown that renormalisation group theory is a useful framework with which to describe the process of pruning neural networks via iterative magnitude pruning. This report formally describes the link between RG theory and IMP and extends previous results around the Lottery Ticket Hypothesis and Elastic Lottery Hypothesis to Hamiltonian Neural Networks for solving differential equations. We find lottery tickets for two Hamiltonian Neural Networks and demonstrate transferability between the two systems, with accuracy being dependent on integration times. The universality of the two systems is then analysed using tools from an RG perspective.
Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection
Authors: Bartosz Grabowski, Maciej Ziaja, Michal Kawulok, Piotr Bosowski, Nicolas Longépé, Bertrand Le Saux, Jakub Nalepa
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09886
Pdf link: https://arxiv.org/pdf/2306.09886
Abstract Cloud detection is a pivotal satellite image pre-processing step that can be performed both on the ground and on board a satellite to tag useful images. In the latter case, it can reduce the amount of data to downlink by pruning the cloudy areas, or to make a satellite more autonomous through data-driven acquisition re-scheduling. We approach this task with nnU-Nets, a self-reconfigurable framework able to perform meta-learning of a segmentation network over various datasets. Unfortunately, such models are commonly memory-inefficient due to their (very) large architectures. To benefit from them in on-board processing, we compress nnU-Nets with knowledge distillation into much smaller and compact U-Nets. Our experiments, performed over Sentinel-2 and Landsat-8 images revealed that nnU-Nets deliver state-of-the-art performance without any manual design. Our approach was ranked within the top 7% best solutions (across 847 teams) in the On Cloud N: Cloud Cover Detection Challenge, where we reached the Jaccard index of 0.882 over more than 10k unseen Sentinel-2 images (the winners obtained 0.897, the baseline U-Net with the ResNet-34 backbone: 0.817, and the classic Sentinel-2 image thresholding: 0.652). Finally, we showed that knowledge distillation enables to elaborate dramatically smaller (almost 280x) U-Nets when compared to nnU-Nets while still maintaining their segmentation capabilities.
Keyword: diffusion

Fault Detection in Induction Motors using Functional Dimensionality Reduction Methods
Authors: María Barroso, José M. Bossio, Carlos M. Alaíz, Ángela Fernández
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09365
Pdf link: https://arxiv.org/pdf/2306.09365
Abstract The implementation of strategies for fault detection and diagnosis on rotating electrical machines is crucial for the reliability and safety of modern industrial systems. The contribution of this work is a methodology that combines conventional strategy of Motor Current Signature Analysis with functional dimensionality reduction methods, namely Functional Principal Components Analysis and Functional Diffusion Maps, for detecting and classifying fault conditions in induction motors. The results obtained from the proposed scheme are very encouraging, revealing a potential use in the future not only for real-time detection of the presence of a fault in an induction motor, but also in the identification of a greater number of types of faults present through an offline analysis.
R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction
Authors: Takeru Oba, Norimichi Ukita
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09483
Pdf link: https://arxiv.org/pdf/2306.09483
Abstract Image-based motion prediction is one of the essential techniques for robot manipulation. Among the various prediction models, we focus on diffusion models because they have achieved state-of-the-art performance in various applications. In image-based motion prediction, diffusion models stochastically predict contextually appropriate motion by gradually denoising random Gaussian noise based on the image context. While diffusion models are able to predict various motions by changing the random noise, they sometimes fail to predict a contextually appropriate motion based on the image because the random noise is sampled independently of the image context. To solve this problem, we propose R2-Diff. In R2-Diff, a motion retrieved from a dataset based on image similarity is fed into a diffusion model instead of random noise. Then, the retrieved motion is refined through the denoising process of the diffusion model. Since the retrieved motion is almost appropriate to the context, it becomes easier to predict contextually appropriate motion. However, traditional diffusion models are not optimized to refine the retrieved motion. Therefore, we propose the method of tuning the hyperparameters based on the distance of the nearest neighbor motion among the dataset to optimize the diffusion model for refinement. Furthermore, we propose an image-based retrieval method to retrieve the nearest neighbor motion in inference. Our proposed retrieval efficiently computes the similarity based on the image features along the motion trajectory. We demonstrate that R2-Diff accurately predicts appropriate motions and achieves high task success rates compared to recent state-of-the-art models in robot manipulation.
Hierarchical Planning and Control for Box Loco-Manipulation
Authors: Zhaoming Xie, Jonathan Tseng, Sebastian Starke, Michiel van de Panne, C. Karen Liu
Subjects: Robotics (cs.RO); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.09532
Pdf link: https://arxiv.org/pdf/2306.09532
Abstract Humans perform everyday tasks using a combination of locomotion and manipulation skills. Building a system that can handle both skills is essential to creating virtual humans. We present a physically-simulated human capable of solving box rearrangement tasks, which requires a combination of both skills. We propose a hierarchical control architecture, where each level solves the task at a different level of abstraction, and the result is a physics-based simulated virtual human capable of rearranging boxes in a cluttered environment. The control architecture integrates a planner, diffusion models, and physics-based motion imitation of sparse motion clips using deep reinforcement learning. Boxes can vary in size, weight, shape, and placement height. Code and trained control policies are provided.
Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model
Authors: Lu Yu, Wei Xiang, Kang Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09551
Pdf link: https://arxiv.org/pdf/2306.09551
Abstract Recent research has demonstrated that the combination of pretrained diffusion models with neural radiance fields (NeRFs) has emerged as a promising approach for text-to-3D generation. Simply coupling NeRF with diffusion models will result in cross-view inconsistency and degradation of stylized view syntheses. To address this challenge, we propose the Edit-DiffNeRF framework, which is composed of a frozen diffusion model, a proposed delta module to edit the latent semantic space of the diffusion model, and a NeRF. Instead of training the entire diffusion for each scene, our method focuses on editing the latent semantic space in frozen pretrained diffusion models by the delta module. This fundamental change to the standard diffusion framework enables us to make fine-grained modifications to the rendered views and effectively consolidate these instructions in a 3D scene via NeRF training. As a result, we are able to produce an edited 3D scene that faithfully aligns to input text instructions. Furthermore, to ensure semantic consistency across different viewpoints, we propose a novel multi-view semantic consistency loss that extracts a latent semantic embedding from the input view as a prior, and aim to reconstruct it in different views. Our proposed method has been shown to effectively edit real-world 3D scenes, resulting in 25% improvement in the alignment of the performed 3D edits with text instructions compared to prior work.
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Authors: Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.09635
Pdf link: https://arxiv.org/pdf/2306.09635
Abstract Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge. We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining (CLIP) model. At test time, we first explore performing a zero-shot modality transfer and condition the diffusion model with a CLIP-encoded text query. However, we observe a noticeable performance drop with respect to image queries. To close this gap, we further adopt a pretrained diffusion prior model to generate a CLIP image embedding given a CLIP text embedding. Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap. While we focus on text-to-audio synthesis, the proposed model can also generate audio from image queries, and it shows competitive performance against a state-of-the-art image-to-audio synthesis model in a subjective listening test. This study offers a new direction of approaching text-to-audio synthesis that leverages the naturally-occurring audio-visual correspondence in videos and the power of pretrained language-vision models.
The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models
Authors: Roy Voetman, Maya Aghaei, Klaas Dijkstra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09762
Pdf link: https://arxiv.org/pdf/2306.09762
Abstract Despite the notable accomplishments of deep object detection models, a major challenge that persists is the requirement for extensive amounts of training data. The process of procuring such real-world data is a laborious undertaking, which has prompted researchers to explore new avenues of research, such as synthetic data generation techniques. This study presents a framework for the generation of synthetic datasets by fine-tuning pretrained stable diffusion models. The synthetic datasets are then manually annotated and employed for training various object detection models. These detectors are evaluated on a real-world test set of 331 images and compared against a baseline model that was trained on real-world images. The results of this study reveal that the object detection models trained on synthetic data perform similarly to the baseline model. In the context of apple detection in orchards, the average precision deviation with the baseline ranges from 0.09 to 0.12. This study illustrates the potential of synthetic data generation techniques as a viable alternative to the collection of extensive training data for the training of deep models.
Understanding Deep Generative Models with Generalized Empirical Likelihoods
Authors: Suman Ravuri, Mélanie Rey, Shakir Mohamed, Marc Deisenroth
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09780
Pdf link: https://arxiv.org/pdf/2306.09780
Abstract Understanding how well a deep generative model captures a distribution of high-dimensional data remains an important open challenge. It is especially difficult for certain model classes, such as Generative Adversarial Networks and Diffusion Models, whose models do not admit exact likelihoods. In this work, we demonstrate that generalized empirical likelihood (GEL) methods offer a family of diagnostic tools that can identify many deficiencies of deep generative models (DGMs). We show, with appropriate specification of moment conditions, that the proposed method can identify which modes have been dropped, the degree to which DGMs are mode imbalanced, and whether DGMs sufficiently capture intra-class diversity. We show how to combine techniques from Maximum Mean Discrepancy and Generalized Empirical Likelihood to create not only distribution tests that retain per-sample interpretability, but also metrics that include label information. We find that such tests predict the degree of mode dropping and mode imbalance up to 60% better than metrics such as improved precision/recall.
AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation
Authors: Yifei Zeng, Yuanxun Lu, Xinya Ji, Yao Yao, Hao Zhu, Xun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09864
Pdf link: https://arxiv.org/pdf/2306.09864
Abstract We introduce AvatarBooth, a novel method for generating high-quality 3D avatars using text prompts or specific images. Unlike previous approaches that can only synthesize avatars based on simple text descriptions, our method enables the creation of personalized avatars from casually captured face or body images, while still supporting text-based model generation and editing. Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models separately for the human face and body. This enables us to capture intricate details of facial appearance, clothing, and accessories, resulting in highly realistic avatar generations. Furthermore, we introduce pose-consistent constraint to the optimization process to enhance the multi-view consistency of synthesized head images from the diffusion model and thus eliminate interference from uncontrolled human poses. In addition, we present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation, thereby enhancing the performance of the proposed system. The resulting avatar model can be further edited using additional text descriptions and driven by motion sequences. Experiments show that AvatarBooth outperforms previous text-to-3D methods in terms of rendering and geometric quality from either text prompts or specific images. Please check our project website at https://zeng-yifei.github.io/avatarbooth_page/.
Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
Authors: Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09869
Pdf link: https://arxiv.org/pdf/2306.09869
Abstract Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing.
Drag-guided diffusion models for vehicle image generation
Authors: Nikos Arechiga, Frank Permenter, Binyang Song, Chenyang Yuan
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.09935
Pdf link: https://arxiv.org/pdf/2306.09935
Abstract Denoising diffusion models trained at web-scale have revolutionized image generation. The application of these tools to engineering design is an intriguing possibility, but is currently limited by their inability to parse and enforce concrete engineering constraints. In this paper, we take a step towards this goal by proposing physics-based guidance, which enables optimization of a performance metric (as predicted by a surrogate model) during the generation process. As a proof-of-concept, we add drag guidance to Stable Diffusion, which allows this tool to generate images of novel vehicles while simultaneously minimizing their predicted drag coefficients.
Towards Better Certified Segmentation via Diffusion Models
Authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Marie-Pierre Revel, Siddharth Garg, Farshad Khorrami, Maria Vakalopoulou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.09949
Pdf link: https://arxiv.org/pdf/2306.09949
Abstract The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees. However, this method exhibits a trade-off between the amount of added noise and the level of certification achieved. In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models. Our experiments show that combining randomized smoothing and diffusion models significantly improves certified robustness, with results indicating a mean improvement of 21 points in accuracy compared to previous state-of-the-art methods on Pascal-Context and Cityscapes public datasets. Our method is independent of the selected segmentation model and does not need any additional specialized training procedure.
Group Orthogonalization Regularization For Vision Models Adaptation and Robustness
Authors: Yoav Kurtz, Noga Bar, Raja Giryes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.10001
Pdf link: https://arxiv.org/pdf/2306.10001
Abstract As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.
Keyword: adaptive

Warpformer: A Multi-scale Modeling Approach for Irregular Clinical Time Series
Authors: Jiawen Zhang, Shun Zheng, Wei Cao, Jiang Bian, Jia Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09368
Pdf link: https://arxiv.org/pdf/2306.09368
Abstract Irregularly sampled multivariate time series are ubiquitous in various fields, particularly in healthcare, and exhibit two key characteristics: intra-series irregularity and inter-series discrepancy. Intra-series irregularity refers to the fact that time-series signals are often recorded at irregular intervals, while inter-series discrepancy refers to the significant variability in sampling rates among diverse series. However, recent advances in irregular time series have primarily focused on addressing intra-series irregularity, overlooking the issue of inter-series discrepancy. To bridge this gap, we present Warpformer, a novel approach that fully considers these two characteristics. In a nutshell, Warpformer has several crucial designs, including a specific input representation that explicitly characterizes both intra-series irregularity and inter-series discrepancy, a warping module that adaptively unifies irregular time series in a given scale, and a customized attention module for representation learning. Additionally, we stack multiple warping and attention modules to learn at different scales, producing multi-scale representations that balance coarse-grained and fine-grained signals for downstream tasks. We conduct extensive experiments on widely used datasets and a new large-scale benchmark built from clinical databases. The results demonstrate the superiority of Warpformer over existing state-of-the-art approaches.
Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting
Authors: Yirong Chen, Ziyue Li, Wanli Ouyang, Michael Lepech
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.09386
Pdf link: https://arxiv.org/pdf/2306.09386
Abstract Accurate traffic forecasting is vital to intelligent transportation systems, which are widely adopted to solve urban traffic issues. Existing traffic forecasting studies focus on modeling spatial-temporal dynamics in traffic data, among which the graph convolution network (GCN) is at the center for exploiting the spatial dependency embedded in the road network graphs. However, these GCN-based methods operate intrinsically on the node level (e.g., road and intersection) only whereas overlooking the spatial hierarchy of the whole city. Nodes such as intersections and road segments can form clusters (e.g., regions), which could also have interactions with each other and share similarities at a higher level. In this work, we propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting by exploiting the spatial hierarchy and modeling multi-scale spatial correlations. Apart from the node-level spatiotemporal blocks, AHSTN introduces the adaptive spatiotemporal downsampling module to infer the spatial hierarchy for spatiotemporal modeling at the cluster level. Then, an adaptive spatiotemporal upsampling module is proposed to upsample the cluster-level representations to the node-level and obtain the multi-scale representations for generating predictions. Experiments on two real-world datasets show that AHSTN achieves better performance over several strong baselines.
CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller
Authors: Yuxiang Yang, Guanya Shi, Xiangyun Meng, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09557
Pdf link: https://arxiv.org/pdf/2306.09557
Abstract We present CAJun, a novel hierarchical learning and control framework that enables legged robots to jump continuously with adaptive jumping distances. CAJun consists of a high-level centroidal policy and a low-level leg controller. In particular, we use reinforcement learning (RL) to train the centroidal policy, which specifies the gait timing, base velocity, and swing foot position for the leg controller. The leg controller optimizes motor commands for the swing and stance legs according to the gait timing to track the swing foot target and base velocity commands using optimal control. Additionally, we reformulate the stance leg optimizer in the leg controller to speed up policy training by an order of magnitude. Our system combines the versatility of learning with the robustness of optimal control. By combining RL with optimal control methods, our system achieves the versatility of learning while enjoys the robustness from control methods, making it easily transferable to real robots. We show that after 20 minutes of training on a single GPU, CAJun can achieve continuous, long jumps with adaptive distances on a Go1 robot with small sim-to-real gaps. Moreover, the robot can jump across gaps with a maximum width of 70cm, which is over 40% wider than existing methods.
Multi-Objective and Model-Predictive Tree Search for Spatiotemporal Informative Planning
Authors: Weizhe Chen, Lantao Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09608
Pdf link: https://arxiv.org/pdf/2306.09608
Abstract Adaptive sampling and planning in robotic environmental monitoring are challenging when the target environmental process varies over space and time. The underlying environmental dynamics require the planning module to integrate future environmental changes so that action decisions made earlier do not quickly become outdated. We propose a Monte Carlo tree search method which not only well balances the environment exploration and exploitation in space, but also catches up to the temporal environmental dynamics. This is achieved by incorporating multi-objective optimization and a look-ahead model-predictive rewarding mechanism. We show that by allowing the robot to leverage the simulated and predicted spatiotemporal environmental process, the proposed informative planning approach achieves a superior performance after comparing with other baseline methods in terms of the root mean square error of the environment model and the distance to the ground truth.
Enabling BIM-Driven Robotic Construction Workflows with Closed-Loop Digital Twins
Authors: Xi Wang, Hongrui Yu, Wes McGee, Carol C. Menassa, Vineet R. Kamat
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.09639
Pdf link: https://arxiv.org/pdf/2306.09639
Abstract Robots can greatly alleviate physical demands on construction workers while enhancing both the productivity and safety of construction projects. Leveraging a Building Information Model (BIM) offers a natural and promising approach to drive a robotic construction workflow. However, because of uncertainties inherent on construction sites, such as discrepancies between the designed and as-built workpieces, robots cannot solely rely on the BIM to guide field construction work. Human workers are adept at improvising alternative plans with their creativity and experience and thus can assist robots in overcoming uncertainties and performing construction work successfully. This research introduces an interactive closed-loop digital twin system that integrates a BIM into human-robot collaborative construction workflows. The robot is primarily driven by the BIM, but it adaptively adjusts its plan based on actual site conditions while the human co-worker supervises the process. If necessary, the human co-worker intervenes in the robot's plan by changing the task sequence or target position, requesting a new motion plan, or modifying the construction component(s)/material(s) to help the robot navigate uncertainties. To investigate the physical deployment of the system, a drywall installation case study is conducted with an industrial robotic arm in a laboratory. In addition, a block pick-and-place experiment is carried out to evaluate system performance. Integrating the flexibility of human workers and the autonomy and accuracy afforded by the BIM, the system significantly increases the robustness of construction robots in the performance of field construction work.
Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data
Authors: Qingyu Tan, Lu Xu, Lidong Bing, Hwee Tou Ng
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.09697
Pdf link: https://arxiv.org/pdf/2306.09697
Abstract Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make similar mistakes during the inference stage. Self-training has been proven effective in alleviating the false negative problem. However, traditional self-training is vulnerable to confirmation bias and exhibits poor performance in minority classes. To overcome this limitation, we proposed a novel class-adaptive re-sampling self-training framework. Specifically, we re-sampled the pseudo-labels for each class by precision and recall scores. Our re-sampling strategy favored the pseudo-labels of classes with high precision and low recall, which improved the overall recall without significantly compromising precision. We conducted experiments on document-level and biomedical relation extraction datasets, and the results showed that our proposed self-training framework consistently outperforms existing competitive methods on the Re-DocRED and ChemDisgene datasets when the training data are incompletely annotated. Our code is released at https://github.com/DAMO-NLP-SG/CAST.
Automatic Trade-off Adaptation in Offline RL
Authors: Phillip Swazinna, Steffen Udluft, Thomas Runkler
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.09744
Pdf link: https://arxiv.org/pdf/2306.09744
Abstract Recently, offline RL algorithms have been proposed that remain adaptive at runtime. For example, the LION algorithm \cite{lion} provides the user with an interface to set the trade-off between behavior cloning and optimality w.r.t. the estimated return at runtime. Experts can then use this interface to adapt the policy behavior according to their preferences and find a good trade-off between conservatism and performance optimization. Since expert time is precious, we extend the methodology with an autopilot that automatically finds the correct parameterization of the trade-off, yielding a new algorithm which we term AutoLION.
A note on the convergence of a class of adaptive optimal identifiers
Authors: Laurent Bako
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.09840
Pdf link: https://arxiv.org/pdf/2306.09840
Abstract This paper proposes a unifying framework for the convergence analysis of a class of adaptive optimal identifiers. The considered class of identifiers is constructed from the sequence of minimizing sets of a family of objective functions. For the purpose of the analysis we introduce a generalized version of the classical persistence of excitation condition. Based on this an Input-to-State-Stability (ISS)-type of property is derived for the studied class of adaptive estimators.
Stable nodal projection method on octree grids
Authors: Matthew Blomquist, Scott R. West, Adam L. Binswanger, Maxime Theillard
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.09957
Pdf link: https://arxiv.org/pdf/2306.09957
Abstract We propose a novel collocated projection method for solving the incompressible Navier-Stokes equations with arbitrary boundaries. Our approach employs non-graded octree grids, where all variables are stored at the nodes. To discretize the viscosity and projection steps, we utilize supra-convergent finite difference approximations with sharp boundary treatments. We demonstrate the stability of our projection on uniform grids, identify a sufficient stability condition on adaptive grids, and validate these findings numerically. We further demonstrate the accuracy and capabilities of our solver with several canonical two- and three-dimensional simulations of incompressible fluid flows. Overall, our method is second-order accurate, allows for dynamic grid adaptivity with arbitrary geometries, and reduces the overhead in code development through data collocation.
Keyword: quantization

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference
Authors: Théo Dupuis, Yoan Fournier, MohammadHossein AskariHemmat, Nizar El Zarif, François Leduc-Primeau, Jean Pierre David, Yvon Savaria
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2306.09905
Pdf link: https://arxiv.org/pdf/2306.09905
Abstract Convolutional Neural Networks (CNNs) are used in a wide range of applications, with full-precision CNNs achieving high accuracy at the expense of portability. Recent progress in quantization techniques has demonstrated that sub-byte Quantized Neural Networks (QNNs) achieve comparable or superior accuracy while significantly reducing the computational cost and memory footprint. However, sub-byte computation on commodity hardware is sub-optimal due to the lack of support for such precision. In this paper, we introduce Sparq, a Sub-byte vector Processor designed for the AcceleRation of QNN inference. This processor is based on a modified version of Ara, an open-source 64-bit RISC-V ``V'' compliant processor. Sparq is implemented in GLOBAL FOUNDRIES 22FDX FD-SOI technology and extends the Instruction Set Architecture (ISA) by adding a new multiply-shift-accumulate instruction to improve sub-byte computation effciency. The floating-point unit is also removed to minimize area and power usage. To demonstrate Sparq performance, we implement an ultra-low-precision (1-bit to 4-bit) vectorized conv2d operation taking advantage of the dedicated hardware. We show that Sparq can significantly accelerate sub-byte computations with respectively 3.2 times, and 1.7 times acceleration over an optimized 16-bit 2D convolution for 2-bit and 4-bit quantization.

A-suozhang / GetArxivDaily

New submissions for Mon, 19 Jun 23 #82

Keyword: efficient

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting

Equitable Multi-task Learning

1st Solution Places for CVPR 2023 UG$^2$+ Challenge Track 2.2-Coded Target Restoration through Atmospheric Turbulence

Understanding Parameter Sharing in Transformers

Sound Demixing Challenge 2023 -- Music Demixing Track Technical Report

Employing Multimodal Machine Learning for Stress Detection

A comprehensive review of 3D convolutional neural network-based classification techniques of diseased and defective crops using non-UAV-based hyperspectral images

Towards Sustainable Computing: Assessing the Carbon Footprint of Heterogeneous Systems

Prevention of cyberattacks in WSN and packet drop by CI framework and information processing protocol using AI and Big Data

A flexible algorithm to offload DAG applications for edge computing

R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction

FedMultimodal: A Benchmark For Multimodal Federated Learning

Streamlining Input/Output Logics with Sequent Calculi

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Granger-Causal Hierarchical Skill Discovery

Block-State Transformer

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Learning CO$_2$ plume migration in faulted reservoirs with Graph Neural Networks

ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models

A New Low-Rank Learning Robust Quaternion Tensor Completion Method for Color Video Inpainting Problem and Fast Algorithms

Online Distillation for Pseudo-Relevance Feedback

A Smooth Binary Mechanism for Efficient Private Continual Observation

Karush-Kuhn-Tucker conditions to build efficient contractors; Application to TDoA localization

Collapsed Inference for Bayesian Deep Learning

End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve

Semi-Offline Reinforcement Learning for Optimized Text Generation

Efficient Coflow Scheduling in Hybrid-Switched Data Center Networks

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

CroCoDai: A Stablecoin for Cross-Chain Commerce

Gradient is All You Need?

Full Parameter Fine-tuning for Large Language Models with Limited Resources

MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm

Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes

$\pi2\text{vec}$: Policy Representations with Successor Features

Efficient Search and Detection of Relevant Plant Parts using Semantics-Aware Active Vision

Sample-Efficient On-Policy Imitation Learning from Observations

High-order finite-volume integration schemes for subsonic magnetohydrodynamics

Direct parametrisation of invariant manifolds for generic non-autonomous systems including superharmonic resonances

An Efficient Algorithm for Power Dominating Set

Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection

LabelBench: A Comprehensive Framework for Benchmarking Label-Efficient Learning

Model-based versus model-free feeding control and water quality monitoring for fish growth tracking in aquaculture systems

Feeding control and water quality monitoring in aquaculture systems: Opportunities and challenges

A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development

Drag-guided diffusion models for vehicle image generation

Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

Group Orthogonalization Regularization For Vision Models Adaptation and Robustness

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

Keyword: faster

A flexible algorithm to offload DAG applications for edge computing

Simplified Temporal Consistency Reinforcement Learning

Structural Restricted Boltzmann Machine for image denoising and classification

Keyword: mobile

A flexible algorithm to offload DAG applications for edge computing

Privacy Guarantees for Personal Mobility Data in Humanitarian Response

Opportunistic Transmission of Distributed Learning Models in Mobile UAVs

Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation

PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition

DeepMPR: Enhancing Opportunistic Routing in Wireless Networks through Multi-Agent Deep Reinforcement Learning

ReactGenie: An Object-Oriented State Abstraction for Complex Multimodal Interactions Using Large Language Models

Lost and not Found: An Investigation of Recovery Methods for Multi-Factor Authentication

CANDID: Correspondence AligNment for Deep-burst Image Denoising

A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development

Keyword: pruning

Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

Representation and decomposition of functions in DAG-DNNs and structural network pruning

Transferability of Winning Lottery Tickets in Neural Network Differential Equation Solvers

Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection

Keyword: diffusion

Fault Detection in Induction Motors using Functional Dimensionality Reduction Methods

R2-Diff: Denoising by diffusion as a refinement of retrieved motion for image-based motion prediction

Hierarchical Planning and Control for Box Loco-Manipulation

Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models

Understanding Deep Generative Models with Generalized Empirical Likelihoods