Abstract
Partial differential equations (PDEs) are used to describe a variety of physical phenomena. Often these equations do not have analytical solutions and numerical approximations are used instead. One of the common methods to solve PDEs is the finite element method. Computing derivative information of the solution with respect to the input parameters is important in many tasks in scientific computing. We extend JAX automatic differentiation library with an interface to Firedrake finite element library. High-level symbolic representation of PDEs allows bypassing differentiating through low-level possibly many iterations of the underlying nonlinear solvers. Differentiating through Firedrake solvers is done using tangent-linear and adjoint equations. This enables the efficient composition of finite element solvers with arbitrary differentiable programs. The code is available at github.com/IvanYashchuk/jax-firedrake.
VertiSync: A Traffic Management Policy with Maximum Throughput for On-Demand Urban Air Mobility Networks
Authors: Milad Pooladsanj, Ketan Savla
Subjects: Networking and Internet Architecture (cs.NI); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC); Probability (math.PR)
Abstract
Urban Air Mobility (UAM) offers a solution to current traffic congestion by providing on-demand air mobility in urban areas. Effective traffic management is crucial for efficient operation of UAM systems, especially for high-demand scenarios. In this paper, we present VertiSync, a centralized traffic management policy for on-demand UAM networks. VertiSync schedules the aircraft for either servicing trip requests or rebalancing in the network subject to aircraft safety margins and separation requirements during takeoff and landing. We characterize the system-level throughput of VertiSync, which determines the demand threshold at which travel times transition from being stabilized to being increasing over time. We show that the proposed policy is able to maximize the throughput for sufficiently large fleet sizes. We demonstrate the performance of VertiSync through a case study for the city of Los Angeles. We show that VertiSync significantly reduces travel times compared to a first-come first-serve scheduling policy.
Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis
Authors: Mehdi Zadem (LIX, U2IS), Sergio Mover (LIX), Sao Mai Nguyen (U2IS, Flowers, IMT Atlantique - INFO, Lab-STICC_RAMBO)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO)
Abstract
Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.
Digital 3D Smocking Design
Authors: Jing Ren, Aviv Segall, Olga Sorkine-Hornung
Abstract
We develop an optimization-based method to model smocking, a surface embroidery technique that provides decorative geometric texturing while maintaining stretch properties of the fabric. During smocking, multiple pairs of points on the fabric are stitched together, creating non-manifold geometric features and visually pleasing textures. Designing smocking patterns is challenging, because the outcome of stitching is unpredictable: the final texture is often revealed only when the whole smocking process is completed, necessitating painstaking physical fabrication and time consuming trial-and-error experimentation. This motivates us to seek a digital smocking design method. Straightforward attempts to compute smocked fabric geometry using surface deformation or cloth simulation methods fail to produce realistic results, likely due to the intricate structure of the designs, the large number of contacts and high-curvature folds. We instead formulate smocking as a graph embedding and shape deformation problem. We extract a coarse graph representing the fabric and the stitching constraints, and then derive the graph structure of the smocked result. We solve for the 3D embedding of this graph, which in turn reliably guides the deformation of the high-resolution fabric mesh. Our optimization based method is simple, efficient, and flexible, which allows us to build an interactive system for smocking pattern exploration. To demonstrate the accuracy of our method, we compare our results to real fabrications on a large set of smocking patterns
Geometric Gait Optimization for Inertia-Dominated Systems With Nonzero Net Momentum
Abstract
Inertia-dominated mechanical systems can achieve net displacement by 1) periodically changing their shape (known as kinematic gait) and 2) adjusting their inertia distribution to utilize the existing nonzero net momentum (known as momentum gait). Therefore, finding the gait that most effectively utilizes the two types of locomotion in terms of the magnitude of the net momentum is a significant topic in the study of locomotion. For kinematic locomotion with zero net momentum, the geometry of optimal gaits is expressed as the equilibria of system constraint curvature flux through the surface bounded by the gait, and the cost associated with executing the gait in the metric space. In this paper, we identify the geometry of optimal gaits with nonzero net momentum effects by lifting the gait description to a time-parameterized curve in shape-time space. We also propose the variational gait optimization algorithm corresponding to the lifted geometric structure, and identify two distinct patterns in the optimal motion, determined by whether or not the kinematic and momentum gaits are concentric. The examples of systems with and without fluid-added mass demonstrate that the proposed algorithm can efficiently solve forward and turning locomotion gaits in the presence of nonzero net momentum. At any given momentum and effort limit, the proposed optimal gait that takes into account both momentum and kinematic effects outperforms the reference gaits that each only considers one of these effects.
Language-Conditioned Observation Models for Visual Object Search
Authors: Thao Nguyen, Vladislav Hrosinkov, Eric Rosen, Stefanie Tellex
Abstract
Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0.46 to 0.66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.
$\texttt{NePhi}$: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
Authors: Lin Tian, Soumyadip Sengupta, Hastings Greer, Raúl San José Estépar, Marc Niethammer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This work proposes $\texttt{NePhi}$, a neural deformation model which results in approximately diffeomorphic transformations. In contrast to the predominant voxel-based approaches, $\texttt{NePhi}$ represents deformations functionally which allows for memory-efficient training and inference. This is of particular importance for large volumetric registrations. Further, while medical image registration approaches representing transformation maps via multi-layer perceptrons have been proposed, $\texttt{NePhi}$ facilitates both pairwise optimization-based registration $\textit{as well as}$ learning-based registration via predicted or optimized global and local latent codes. Lastly, as deformation regularity is a highly desirable property for most medical image registration tasks, $\texttt{NePhi}$ makes use of gradient inverse consistency regularization which empirically results in approximately diffeomorphic transformations. We show the performance of $\texttt{NePhi}$ on two 2D synthetic datasets as well as on real 3D lung registration. Our results show that $\texttt{NePhi}$ can achieve similar accuracies as voxel-based representations in a single-resolution registration setting while using less memory and allowing for faster instance-optimization.
Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Abstract
Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
Energy-Constrained Active Exploration Under Incremental-Resolution Symbolic Perception
Authors: Disha Kamale, Sofie Haesaert, Cristian-Ioan Vasile
Abstract
In this work, we consider the problem of autonomous exploration in search of targets while respecting a fixed energy budget. The robot is equipped with an incremental-resolution symbolic perception module wherein the perception of targets in the environment improves as the robot's distance from targets decreases. We assume no prior information about the total number of targets, their locations as well as their possible distribution within the environment. This work proposes a novel decision-making framework for the resulting constrained sequential decision-making problem by first converting it into a reward maximization problem on a product graph computed offline. It is then solved online as a Mixed-Integer Linear Program (MILP) where the knowledge about the environment is updated at each step, combining automata-based and MILP-based techniques. We demonstrate the efficacy of our approach with the help of a case study and present empirical evaluation in terms of expected regret. Furthermore, the runtime performance shows that online planning can be efficiently performed for moderately-sized grid environments.
International Competition on Graph Counting Algorithms 2023
Abstract
This paper reports on the details of the International Competition on Graph Counting Algorithms (ICGCA) held in 2023. The graph counting problem is to count the subgraphs satisfying specified constraints on a given graph. The problem belongs to #P-complete, a computationally tough class. Since many essential systems in modern society, e.g., infrastructure networks, are often represented as graphs, graph counting algorithms are a key technology to efficiently scan all the subgraphs representing the feasible states of the system. In the ICGCA, contestants were asked to count the paths on a graph under a length constraint. The benchmark set included 150 challenging instances, emphasizing graphs resembling infrastructure networks. Eleven solvers were submitted and ranked by the number of benchmarks correctly solved within a time limit. The winning solver, TLDC, was designed based on three fundamental approaches: backtracking search, dynamic programming, and model counting or #SAT (a counting version of Boolean satisfiability). Detailed analyses show that each approach has its own strengths, and one approach is unlikely to dominate the others. The codes and papers of the participating solvers are available: https://afsa.jp/icgca/.
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Authors: Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing Long, Min Yang, Fei Huang, Yongbin Li
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It defines five core multi-modal dialogue tasks and covers six datasets. Furthermore, in order to provide a comprehensive assessment of the model's performance across all tasks, we developed a novel evaluation metric called VDscore, which is based on the Analytic Hierarchy Process~(AHP) method. Additionally, we present a straightforward yet efficient baseline model, named \textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog \textbf{T}ransformer), to promote the advancement of general multi-modal dialogue systems. It progressively builds its multi-modal foundation and dialogue capability via a two-stage pre-training strategy. We believe that the VDialogUE benchmark, along with the evaluation scripts and our baseline models, will accelerate the development of visually-grounded dialog systems and lead to the development of more sophisticated and effective pre-trained models.
DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective
Authors: Pu Miao, Zeyao Du, Junlin Zhang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Several prior studies have suggested that word frequency biases can cause the Bert model to learn indistinguishable sentence embeddings. Contrastive learning schemes such as SimCSE and ConSERT have already been adopted successfully in unsupervised sentence embedding to improve the quality of embeddings by reducing this bias. However, these methods still introduce new biases such as sentence length bias and false negative sample bias, that hinders model's ability to learn more fine-grained semantics. In this paper, we reexamine the challenges of contrastive sentence embedding learning from a debiasing perspective and argue that effectively eliminating the influence of various biases is crucial for learning high-quality sentence embeddings. We think all those biases are introduced by simple rules for constructing training data in contrastive learning and the key for contrastive learning sentence embedding is to mimic the distribution of training data in supervised machine learning in unsupervised way. We propose a novel contrastive framework for sentence embedding, termed DebCSE, which can eliminate the impact of these biases by an inverse propensity weighted sampling method to select high-quality positive and negative pairs according to both the surface and semantic similarity between sentences. Extensive experiments on semantic textual similarity (STS) benchmarks reveal that DebCSE significantly outperforms the latest state-of-the-art models with an average Spearman's correlation coefficient of 80.33% on BERTbase.
Multi-Grade Deep Learning for Partial Differential Equations with Applications to the Burgers Equation
Abstract
We develop in this paper a multi-grade deep learning method for solving nonlinear partial differential equations (PDEs). Deep neural networks (DNNs) have received super performance in solving PDEs in addition to their outstanding success in areas such as natural language processing, computer vision, and robotics. However, training a very deep network is often a challenging task. As the number of layers of a DNN increases, solving a large-scale non-convex optimization problem that results in the DNN solution of PDEs becomes more and more difficult, which may lead to a decrease rather than an increase in predictive accuracy. To overcome this challenge, we propose a two-stage multi-grade deep learning (TS-MGDL) method that breaks down the task of learning a DNN into several neural networks stacked on top of each other in a staircase-like manner. This approach allows us to mitigate the complexity of solving the non-convex optimization problem with large number of parameters and learn residual components left over from previous grades efficiently. We prove that each grade/stage of the proposed TS-MGDL method can reduce the value of the loss function and further validate this fact through numerical experiments. Although the proposed method is applicable to general PDEs, implementation in this paper focuses only on the 1D, 2D, and 3D viscous Burgers equations. Experimental results show that the proposed two-stage multi-grade deep learning method enables efficient learning of solutions of the equations and outperforms existing single-grade deep learning methods in predictive accuracy. Specifically, the predictive errors of the single-grade deep learning are larger than those of the TS-MGDL method in 26-60, 4-31 and 3-12 times, for the 1D, 2D, and 3D equations, respectively.
Deep Reinforcement Learning-based Scheduling in Edge and Fog Computing Environments
Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negatively affect IoT applications' response time. Moreover, many IoT applications are composed of dependent components incurring extra constraints for their execution. Besides, edge/fog computing environments and IoT applications are inherently dynamic and stochastic. Thus, efficient and adaptive scheduling of IoT applications in heterogeneous edge/fog computing environments is of paramount importance. However, limited computational resources on edge/fog servers imposes an extra burden for applying optimal but computationally demanding techniques. To overcome these challenges, we propose a Deep Reinforcement Learning-based IoT application Scheduling algorithm, called DRLIS to adaptively and efficiently optimize the response time of heterogeneous IoT applications and balance the load of the edge/fog servers. We implemented DRLIS as a practical scheduler in the FogBus2 function-as-a-service framework for creating an edge-fog-cloud integrated serverless computing environment. Results obtained from extensive experiments show that DRLIS significantly reduces the execution cost of IoT applications by up to 55%, 37%, and 50% in terms of load balancing, response time, and weighted cost, respectively, compared with metaheuristic algorithms and other reinforcement learning techniques.
Abstract
We introduce M3-AUDIODEC, an innovative neural spatial audio codec designed for efficient compression of multi-channel (binaural) speech in both single and multi-speaker scenarios, while retaining the spatial location information of each speaker. This model boasts versatility, allowing configuration and training tailored to a predetermined set of multi-channel, multi-speaker, and multi-spatial overlapping speech conditions. Key contributions are as follows: 1) Previous neural codecs are extended from single to multi-channel audios. 2) The ability of our proposed model to compress and decode for overlapping speech. 3) A groundbreaking architecture that compresses speech content and spatial cues separately, ensuring the preservation of each speaker's spatial context after decoding. 4) M3-AUDIODEC's proficiency in reducing the bandwidth for compressing two-channel speech by 48% when compared to individual binaural channel compression. Impressively, at a 12.6 kbps operation, it outperforms Opus at 24 kbps and AUDIODEC at 24 kbps by 37% and 52%, respectively. In our assessment, we employed speech enhancement and room acoustic metrics to ascertain the accuracy of clean speech and spatial cue estimates from M3-AUDIODEC. Audio demonstrations and source code are available online.
Learning Tube-Certified Control Using Robust Contraction Metrics
Abstract
Control design for general nonlinear robotic systems with guaranteed stability and/or safety in the presence of model uncertainties is a challenging problem. Recent efforts attempt to learn a controller and a certificate (e.g., a Lyapunov function or a contraction metric) jointly using neural networks (NNs), in which model uncertainties are generally ignored during the learning process. In this paper, for nonlinear systems subject to bounded disturbances, we present a framework for jointly learning a robust nonlinear controller and a contraction metric using a novel disturbance rejection objective that certifies a universal $\mathcal L_\infty$ gain bound using NNs for user-specified variables. The learned controller aims to minimize the effect of disturbances on the actual trajectories of state and/or input variables from their nominal counterparts while providing certificate tubes around nominal trajectories that are guaranteed to contain actual trajectories in the presence of disturbances. Experimental results demonstrate that our framework can generate tighter tubes and a controller that is computationally efficient to implement.
Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects
Abstract
Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. However, extant approaches often necessitate costly and inefficient test-time interactions with each unseen instance. Recognizing this limitation, we observe that despite their distinct shapes, different categories often share similar local geometries essential for manipulation, such as pullable handles and graspable edges - a factor typically underutilized in previous few-shot learning works. To harness this commonality, we introduce 'Where2Explore', an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances. Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects. Extensive experiments in simulated and real-world environments demonstrate our framework's capacity for efficient few-shot exploration and generalization.
A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization
Abstract
This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desired control input that stabilizes the first $2^{nd}$-order sub-system. Solving the equation by considering the coupling terms as design variables, the reference signal is generated for the second sub-system. Then through simple compensation design, together with the second PD controller, the cascaded PD controller is derived. In order to further improve the performance, two FLRs are implemented that adaptively tune the parameters of PD controllers. Under natural assumptions, the cascaded fuzzy PD controller is proved to possess locally asymptotic stability. All the offline tuning processes are completed data-efficiently by Bayesian Optimization. The results in simulation illustrate the stability and validity of our proposed method. Besides, the idea of cascaded PD controller presented here may be extended as a novel control method for other under-actuated systems, and the stability analysis renders a new perspective towards the stability proof of all other fuzzy-enhanced PID controllers.
Comparison of DDS, MQTT, and Zenoh in Edge-to-Edge/Cloud Communication with ROS 2
Authors: Jiaqiang Zhang, Xianjia Yu, Sier Ha, Jorge Pena Queralta, Tomi Westerlund
Abstract
With the development of IoT and edge computing, there is a need for efficient and reliable middleware to handle the communication among Edge devices or between Edge and Cloud. Meanwhile, ROS\,2 is more commonly used in robotic systems, but there is no comparison study of middleware using ROS Messages. In this study, we compared the middlewares that are commonly used in ROS\,2 systems, including DDS, Zenoh, and MQTT. In order to evaluate the performance of the middleware in Edge-to-Edge and Edge-to-Cloud scenarios, we conducted the experiments in a multi-host environment and compared the latency and throughput of the middlewares with different types and sizes of ROS Messages in three network setups including Ethernet, Wi-Fi, and 4G. Additionally, we implemented different middlewares on a real robot platform, TurtleBot 4, and sent commands from a host to the robot to run a square-shaped trajectory. With the Optitrack Motion Capture system, we recorded the robot trajectories and analyzed the drift error. The results showed that CycloneDDS performs better under Ethernet, and Zenoh performs better under Wifi and 4G. In the actual robot test, Zenoh's trajectory drift error was the smallest.
Abstract
A recent trend in deep learning algorithms has been towards training large scale models, having high parameter count and trained on big dataset. However, robustness of such large scale models towards real-world settings is still a less-explored topic. In this work, we first benchmark the performance of these models under different perturbations and datasets thereby representing real-world shifts, and highlight their degrading performance under these shifts. We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks and can also lead them to forget some of the desired characterstics. Finally, we propose a simple and cost-effective method to solve this problem, inspired by knowledge transfer literature. It involves robustifying smaller models, at a lower computation cost, and then use them as teachers to tune a fraction of these large scale networks, reducing the overall computational overhead. We evaluate our proposed method under various vision perturbations including ImageNet-C,R,S,A datasets and also for transfer learning, zero-shot evaluation setups on different datasets. Benchmark results show that our method is able to induce robustness to these large scale models efficiently, requiring significantly lower time and also preserves the transfer learning, zero-shot properties of the original model which none of the existing methods are able to achieve.
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning
Authors: Yucong Zhang, Hongbin Suo, Yulong Wan, Ming Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
This paper proposes an approach for anomalous sound detection that incorporates outlier exposure and inlier modeling within a unified framework by multitask learning. While outlier exposure-based methods can extract features efficiently, it is not robust. Inlier modeling is good at generating robust features, but the features are not very effective. Recently, serial approaches are proposed to combine these two methods, but it still requires a separate training step for normal data modeling. To overcome these limitations, we use multitask learning to train a conformer-based encoder for outlier-aware inlier modeling. Moreover, our approach provides multi-scale scores for detecting anomalies. Experimental results on the MIMII and DCASE 2020 task 2 datasets show that our approach outperforms state-of-the-art single-model systems and achieves comparable results with top-ranked multi-system ensembles.
A Gaussian Copula Approach to the Performance Analysis of Fluid Antenna Systems
Authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Chan-Byoung Chae, Kin-Fai Tong, Yangyang Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper investigates the performance of a singleuser fluid antenna system (FAS), by exploiting a class of elliptical copulas to describe the structure of dependency amongst the fluid antenna ports. By expressing Jakes' model in terms of the Gaussian copula, we consider two cases: (i) the general case, i.e., any arbitrary correlated fading distribution; and (ii) the specific case, i.e., correlated Nakagami-m fading. For both scenarios, we first derive analytical expressions for the cumulative distribution function (CDF) and probability density function (PDF) of the equivalent channel in terms of multivariate normal distribution. Then, we obtain the outage probability (OP) and the delay outage rate (DOR) to analyze the performance of the FAS. By employing the popular rank correlation coefficients such as Spearman's {rho} and Kendall's {\tau}, we measure the degree of dependency in correlated arbitrary fading channels and illustrate how the Gaussian copula can be accurately connected to Jakes' model in FAS without complicated mathematical analysis. Numerical results show that increasing the fluid antenna size provides lower OP and DOR, but the system performance saturates as the number of antenna ports increases. In addition, our results indicate that FAS provides better performance compared to conventional single-fixed antenna systems even when the size of fluid antenna is small.
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Authors: Zipeng Qi, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generating realistic talking faces is a complex and widely discussed task with numerous applications. In this paper, we present DiffTalker, a novel model designed to generate lifelike talking faces through audio and landmark co-driving. DiffTalker addresses the challenges associated with directly applying diffusion models to audio control, which are traditionally trained on text-image pairs. DiffTalker consists of two agent networks: a transformer-based landmarks completion network for geometric accuracy and a diffusion-based face generation network for texture details. Landmarks play a pivotal role in establishing a seamless connection between the audio and image domains, facilitating the incorporation of knowledge from pre-trained diffusion models. This innovative approach efficiently produces articulate-speaking faces. Experimental results showcase DiffTalker's superior performance in producing clear and geometrically accurate talking faces, all without the need for additional alignment between audio and image features.
Universality of underlying mechanism for successful deep learning
Abstract
An underlying mechanism for successful deep learning (DL) with a limited deep architecture and dataset, namely VGG-16 on CIFAR-10, was recently presented based on a quantitative method to measure the quality of a single filter in each layer. In this method, each filter identifies small clusters of possible output labels, with additional noise selected as labels out of the clusters. This feature is progressively sharpened with the layers, resulting in an enhanced signal-to-noise ratio (SNR) and higher accuracy. In this study, the suggested universal mechanism is verified for VGG-16 and EfficientNet-B0 trained on the CIFAR-100 and ImageNet datasets with the following main results. First, the accuracy progressively increases with the layers, whereas the noise per filter typically progressively decreases. Second, for a given deep architecture, the maximal error rate increases approximately linearly with the number of output labels. Third, the average filter cluster size and the number of clusters per filter at the last convolutional layer adjacent to the output layer are almost independent of the number of dataset labels in the range [3, 1,000], while a high SNR is preserved. The presented DL mechanism suggests several techniques, such as applying filter's cluster connections (AFCC), to improve the computational complexity and accuracy of deep architectures and furthermore pinpoints the simplification of pre-existing structures while maintaining their accuracies.
Optimal Control for Indoor Vertical Farms Based on Crop Growth
Authors: Annalena Daniels, Michael Fink, Marion Leibold, Dirk Wollherr, Senthold Asseng
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Vertical farming allows for year-round cultivation of a variety of crops, overcoming environmental limitations and ensuring food security. This closed and highly controlled system allows the plants to grow in optimal conditions, so that they reach maturity faster and yield more than on a conventional outdoor farm. However, one of the challenges of vertical farming is the high energy consumption. In this work, we optimize wheat growth using an optimal control approach with two objectives: first, we optimize inputs such as water, radiation, and temperature for each day of the growth cycle, and second, we optimize the duration of the plant's growth period to achieve the highest possible yield over a whole year. For this, we use a nonlinear, discrete-time hybrid model based on a simple universal crop model that we adapt to make the optimization more efficient. Using our approach, we find an optimal trade-off between used resources, net profit of the yield, and duration of a cropping period, thus increasing the annual yield of crops significantly while keeping input costs as low as possible. This work demonstrates the high potential of control theory in the discipline of vertical farming.
Achieving 45% efficiency of CIGS/CdS Solar Cell by adding GaAs using optimization techniques
Authors: Satyam Bhatti, Habib Ullah Manzoor, Ahmed Zoha, Rami Ghannam
Abstract
This paper proposes an efficient three-layered p-GaAs/p-CIGS/n-CdS (PPN), a unique solar cell architecture. Copper indium gallium selenide (CIGS)-based solar cells exhibit substantial performance than the ones utilizing cadmium sulfide (CdS). On the contrary, CIGS-based devices are more efficient, considering their device performance, environmentally benign nature, and reduced cost. Therefore, our paper proposes a numerical analysis of the homojunction PPN-junction GaAs solar cell structure along with n-ZnO front contact that was simulated using the Solar Cells Capacitance Simulator (SCAPS-1D) software. Moreover, we investigated optimization techniques for evaluating the effect of the thickness and the carrier density on the performance of the PPN layer on solar cell architecture. Subsequently, the paper discusses the electronic characteristics of adding GaAs material on the top of the conventional (PN) junction, further leading to improved values of the parameters, such as the power conversion efficiency (PCE), open-circuit voltage (VOC), fill factor (FF) and short-circuit current density (JSC) of the solar cell. The most promising results of our study show that adding the GaAs layer using the optimised values of thickness as 5 ({\mu}m) and carrier density as 1*1020 (1/cm) will result in the maximum PCE, VOC, FF, and JSC of 45.7%, 1.16V, 89.52% and 43.88 (mA/m2), respectively, for the proposed solar cell architecture.
A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives
Authors: Zhengyang Lv, Mingyu Yan, Xin Liu, Mengyao Dong, Xiaochun Ye, Dongrui Fan, Ninghui Sun
Abstract
Graph-related applications have experienced significant growth in academia and industry, driven by the powerful representation capabilities of graph. However, efficiently executing these applications faces various challenges, such as load imbalance, random memory access, etc. To address these challenges, researchers have proposed various acceleration systems, including software frameworks and hardware accelerators, all of which incorporate graph pre-processing (GPP). GPP serves as a preparatory step before the formal execution of applications, involving techniques such as sampling, reorder, etc. However, GPP execution often remains overlooked, as the primary focus is directed towards enhancing graph applications themselves. This oversight is concerning, especially considering the explosive growth of real-world graph data, where GPP becomes essential and even dominates system running overhead. Furthermore, GPP methods exhibit significant variations across devices and applications due to high customization. Unfortunately, no comprehensive work systematically summarizes GPP. To address this gap and foster a better understanding of GPP, we present a comprehensive survey dedicated to this area. We propose a double-level taxonomy of GPP, considering both algorithmic and hardware perspectives. Through listing relavent works, we illustrate our taxonomy and conduct a thorough analysis and summary of diverse GPP techniques. Lastly, we discuss challenges in GPP and potential future directions.
Fluid Antenna-Assisted Dirty Multiple Access Channels over Composite Fading
Authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Chan-Byoung Chae, Kin-Fai Tong, Yangyang Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This letter investigates the application of the emerging fluid antenna (FA) technology in multiuser communication systems when side information (SI) is available at the transmitters. In particular, we consider a K-user dirty multiple access channel (DMAC) with non-causally known SI at the transmitters, where K users send independent messages to a common receiver with a FA capable of changing its location depending on the channel condition. By connecting Jakes' model to copula theory through Spearman's {\rho} rank correlation coefficient, we accurately describe the spatial correlation between the FA channels, and derive a closed-form expression for the outage probability (OP) under Fisher-Snedecor F fading. Numerical results illustrate how considering FA can improve the performance of multiuser communication systems in terms of the OP and also support a large number of users using only one FA at the common receiver in a few wavelengths of space.
Abstract
We study the algebraic complexity of annihilators of polynomials maps. In particular, when a polynomial map is `encoded by' a small algebraic circuit, we show that the coefficients of an annihilator of the map can be computed in PSPACE. Even when the underlying field is that of reals or complex numbers, an analogous statement is true. We achieve this by using the class VPSPACE that coincides with computability of coefficients in PSPACE, over integers. As a consequence, we derive the following two conditional results. First, we show that a VP-explicit hitting set generator for all of VP would separate either VP from VNP, or non-uniform P from PSPACE. Second, in relation to algebraic natural proofs, we show that proving an algebraic natural proofs barrier would imply either VP $\neq$ VNP or DSPACE($\log^{\log^{\ast}n} n$) $\not\subset$ P.
Influence Robustness of Nodes in Multiplex Networks against Attacks
Authors: Boqian Ma, Hao Ren, Jiaojiao Jiang
Subjects: Social and Information Networks (cs.SI); Networking and Internet Architecture (cs.NI); Physics and Society (physics.soc-ph)
Abstract
Recent advances have focused mainly on the resilience of the monoplex network in attacks targeting random nodes or links, as well as the robustness of the network against cascading attacks. However, very little research has been done to investigate the robustness of nodes in multiplex networks against targeted attacks. In this paper, we first propose a new measure, MultiCoreRank, to calculate the global influence of nodes in a multiplex network. The measure models the influence propagation on the core lattice of a multiplex network after the core decomposition. Then, to study how the structural features can affect the influence robustness of nodes, we compare the dynamics of node influence on three types of multiplex networks: assortative, neutral, and disassortative, where the assortativity is measured by the correlation coefficient of the degrees of nodes across different layers. We found that assortative networks have higher resilience against attack than neutral and disassortative networks. The structure of disassortative networks tends to break down quicker under attack.
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Authors: Michael Kartmann, Tim Keil, Mario Ohlberger, Stefan Volkwein, Barbara Kaltenbacher
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Abstract
In this contribution, we are concerned with model order reduction in the context of iterative regularization methods for the solution of inverse problems arising from parameter identification in elliptic partial differential equations. Such methods typically require a large number of forward solutions, which makes the use of the reduced basis method attractive to reduce computational complexity. However, the considered inverse problems are typically ill-posed due to their infinite-dimensional parameter space. Moreover, the infinite-dimensional parameter space makes it impossible to build and certify classical reduced-order models efficiently in a so-called "offline phase". We thus propose a new algorithm that adaptively builds a reduced parameter space in the online phase. The enrichment of the reduced parameter space is naturally inherited from the Tikhonov regularization within an iteratively regularized Gau{\ss}-Newton method. Finally, the adaptive parameter space reduction is combined with a certified reduced basis state space reduction within an adaptive error-aware trust region framework. Numerical experiments are presented to show the efficiency of the combined parameter and state space reduction for inverse parameter identification problems with distributed reaction or diffusion coefficients.
Impact of Excitation and Weighting Errors on Performance of Compact OTA Testing Systems
Abstract
This paper investigates the impact of complex excitation errors of the chamber array antenna on the accuracy of the test zone of a random line-of-sight over-the-air testing setup. First, several combinations of compact chamber arrays of lengths L and short distances D between the test zone and the chamber array, which emulate a plane wave impinging at the test zone are obtained. The chamber array is linear and uniform with 100 antenna elements, and a linear taper was applied to some of the elements to emulate a plane wave impinging at the test zone with more compact setups. A subset of L and D was chosen, providing compact over-the-air test setups that fulfilled the defined figures of merit, which assess the similarity of the obtained field distribution to that of a plane wave. The tolerance of the chosen setups to complex excitation errors of the chamber array was then investigated, concluding that these errors must be considered when defining appropriate L and D combinations. Moreover, the performance of the matched filter and zero-forcing algorithms is evaluated for errors of the device under test array weighting coefficients. A random line-of-sight over-the-air testing setup with two arrays was simulated, where one of the arrays emulated the desired signal and the other emulated the interference, observing that the errors were more significant at higher signal-to-noise ratios. Additionally, the zero-forcing algorithm was more sensitive to errors than the matched filter, which was expected since the accuracy of the former for interference suppression is critical.
WASM-MUTATE: Fast and Effective Binary Diversification for WebAssembly
Authors: Javier Cabrera-Arteaga, Nicholas Fitzgerald, Martin Monperrus, Benoit Baudry
Abstract
WebAssembly has is renowned for its efficiency and security in browser environments and servers alike. The burgeoning ecosystem of WebAssembly compilers and tools lacks robust software diversification systems. We introduce WASM-MUTATE, a compiler-agnostic WebAssembly diversification engine. It is engineered to fulfill the following key criteria: 1) the rapid generation of semantically equivalent yet behaviorally diverse WebAssembly variants, 2) universal applicability to any WebAssembly programs regardless of the source programming language, and 3) the capability to counter high-risk security threats. Utilizing an e-graph data structure, WASM-MUTATE is both fast and effective. Our experiments reveal that WASM-MUTATE can efficiently generate tens of thousands of unique WebAssembly variants in a matter of minutes. Notably, WASM-MUTATE can protect WebAssembly binaries against timing side-channel attacks, specifically, Spectre.
Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation
Authors: Thorsten Hempel, Ahmed A. Abdelrahman, Ayoub Al-Hamadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Estimating the head pose of a person is a crucial problem for numerous applications that is yet mainly addressed as a subtask of frontal pose prediction. We present a novel method for unconstrained end-to-end head pose estimation to tackle the challenging task of full range of orientation head pose prediction. We address the issue of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This allows to efficiently learn full rotation appearance and to overcome the limitations of the current state-of-the-art. Together with new accumulated training data that provides full head pose rotation data and a geodesic loss approach for stable learning, we design an advanced model that is able to predict an extended range of head orientations. An extensive evaluation on public datasets demonstrates that our method significantly outperforms other state-of-the-art methods in an efficient and robust manner, while its advanced prediction range allows the expansion of the application area. We open-source our training and testing code along with our trained models: https://github.com/thohemp/6DRepNet360.
Physics-constrained robust learning of open-form PDEs from limited and noisy data
Authors: Mengge Du, Longfeng Nie, Siyu Lou, Yuntian Chenc, Dongxiao Zhang
Abstract
Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis
Authors: Mehdi Zadem, Sergio Mover, Sao Mai Nguyen
Abstract
Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks
Authors: Soumadeep Das, Aryan Mohammadi Pasikhani, Prosanta Gope, John A. Clark, Chintan Patel, Biplab Sikdar
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Underwater Acoustic Sensor Networks (UW-ASNs) are predominantly used for underwater environments and find applications in many areas. However, a lack of security considerations, the unstable and challenging nature of the underwater environment, and the resource-constrained nature of the sensor nodes used for UW-ASNs (which makes them incapable of adopting security primitives) make the UW-ASN prone to vulnerabilities. This paper proposes an Adaptive decentralised Intrusion Detection and Prevention System called AIDPS for UW-ASNs. The proposed AIDPS can improve the security of the UW-ASNs so that they can efficiently detect underwater-related attacks (e.g., blackhole, grayhole and flooding attacks). To determine the most effective configuration of the proposed construction, we conduct a number of experiments using several state-of-the-art machine learning algorithms (e.g., Adaptive Random Forest (ARF), light gradient-boosting machine, and K-nearest neighbours) and concept drift detection algorithms (e.g., ADWIN, kdqTree, and Page-Hinkley). Our experimental results show that incremental ARF using ADWIN provides optimal performance when implemented with One-class support vector machine (SVM) anomaly-based detectors. Furthermore, our extensive evaluation results also show that the proposed scheme outperforms state-of-the-art bench-marking methods while providing a wider range of desirable features such as scalability and complexity.
Dynamic programming on bipartite tree decompositions
Authors: Lars Jaffke, Laure Morelle, Ignasi Sau, Dimitrios M. Thilikos
Abstract
We revisit a graph width parameter that we dub bipartite treewidth, along with its associated graph decomposition that we call bipartite tree decomposition. Bipartite treewidth can be seen as a common generalization of treewidth and the odd cycle transversal number. Intuitively, a bipartite tree decomposition is a tree decomposition whose bags induce almost bipartite graphs and whose adhesions contain at most one vertex from the bipartite part of any other bag, while the width of such decomposition measures how far the bags are from being bipartite. Adapted from a tree decomposition originally defined by Demaine, Hajiaghayi, and Kawarabayashi [SODA 2010] and explicitly defined by Tazari [Th. Comp. Sci. 2012], bipartite treewidth appears to play a crucial role for solving problems related to odd-minors, which have recently attracted considerable attention. As a first step toward a theory for solving these problems efficiently, the main goal of this paper is to develop dynamic programming techniques to solve problems on graphs of small bipartite treewidth. For such graphs, we provide a number of para-NP-completeness results, FPT-algorithms, and XP-algorithms, as well as several open problems. In particular, we show that $K_t$-Subgraph-Cover, Weighted Vertex Cover/Independent Set, Odd Cycle Transversal, and Maximum Weighted Cut are $FPT$ parameterized by bipartite treewidth. We provide the following complexity dichotomy when $H$ is a 2-connected graph, for each of $H$-Subgraph-Packing, $H$-Induced-Packing, $H$-Scattered-Packing, and $H$-Odd-Minor-Packing problem: if $H$ is bipartite, then the problem is para-NP-complete parameterized by bipartite treewidth while, if $H$ is non-bipartite, then it is solvable in XP-time. We define 1-${\cal H}$-treewidth by replacing the bipartite graph class by any class ${\cal H}$. Most of the technology developed here works for this more general parameter.
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Abstract
Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Experiments and extensive ablation studies on 8 benchmarks demonstrate that our approach is an efficient method for prompt learning. Specifically, PRE achieves a notable enhancement of 5.60% in average accuracy on New classes and 3% in Harmonic mean compared to CoOp in the 16-shot setting, all achieved within a good training time.
Communication Efficient Private Federated Learning Using Dithering
Authors: Burak Hasircioglu, Deniz Gunduz
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.
Decomposition of linear tensor transformations
Authors: Claudio Turchetti
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Abstract
One of the main issues in computing a tensor decomposition is how to choose the number of rank-one components, since there is no finite algorithms for determining the rank of a tensor. A commonly used approach for this purpose is to find a low-dimensional subspace by solving an optimization problem and assuming the number of components is fixed. However, even though this algorithm is efficient and easy to implement, it often converges to poor local minima and suffers from outliers and noise. The aim of this paper is to develop a mathematical framework for exact tensor decomposition that is able to represent a tensor as the sum of a finite number of low-rank tensors. In the paper three different problems will be carried out to derive: i) the decomposition of a non-negative self-adjoint tensor operator; ii) the decomposition of a linear tensor transformation; iii) the decomposition of a generic tensor.
MC-NeRF: Muti-Camera Neural Radiance Fields for Muti-Camera Image Acquisition Systems
Authors: Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural Radiance Fields (NeRF) employ multi-view images for 3D scene representation and have shown remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods often assume a global unique camera and seldom consider scenarios with multiple cameras. Besides, some pose-robust methods still remain susceptible to suboptimal solutions when poses are poor initialized. In this paper, we propose MC-NeRF, a method can jointly optimize both intrinsic and extrinsic parameters for bundle-adjusting Neural Radiance Fields. Firstly, we conduct a theoretical analysis to tackle the degenerate case and coupling issue that arise from the joint optimization between intrinsic and extrinsic parameters. Secondly, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Lastly, we present a global end-to-end network with training sequence that enables the regression of intrinsic and extrinsic parameters, along with the rendering network. Moreover, most existing datasets are designed for unique camera, we create a new dataset that includes four different styles of multi-camera acquisition systems, allowing readers to generate custom datasets. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we adopt up to 110 images with 110 different intrinsic and extrinsic parameters, to achieve 3D scene representation without providing initial poses. The Code and supplementary materials are available at https://in2-viaun.github.io/MC-NeRF.
SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions
Authors: Yin Lin, Bolin Ding, H. V. Jagadish, Jingren Zhou
Abstract
Before applying data analytics or machine learning to a data set, a vital step is usually the construction of an informative set of features from the data. In this paper, we present SMARTFEAT, an efficient automated feature engineering tool to assist data users, even non-experts, in constructing useful features. Leveraging the power of Foundation Models (FMs), our approach enables the creation of new features from the data, based on contextual information and open-world knowledge. To achieve this, our method incorporates an intelligent operator selector that discerns a subset of operators, effectively avoiding exhaustive combinations of original features, as is typically observed in traditional automated feature engineering tools. Moreover, we address the limitations of performing data tasks through row-level interactions with FMs, which could lead to significant delays and costs due to excessive API calls. To tackle this, we introduce a function generator that facilitates the acquisition of efficient data transformations, such as dataframe built-in methods or lambda functions, ensuring the applicability of SMARTFEAT to generate new features for large datasets. With SMARTFEAT, dataset users can efficiently search for and apply transformations to obtain new features, leading to improvements in the AUC of downstream ML classification by up to 29.8%.
A Unified Perspective on Multiple Shooting In Differential Dynamic Programming
Authors: He Li, Wenhao Yu, Tingnan Zhang, Patrick M. Wensing
Abstract
Differential Dynamic Programming (DDP) is an efficient computational tool for solving nonlinear optimal control problems. It was originally designed as a single shooting method and thus is sensitive to the initial guess supplied. This work considers the extension of DDP to multiple shooting (MS), improving its robustness to initial guesses. A novel derivation is proposed that accounts for the defect between shooting segments during the DDP backward pass, while still maintaining quadratic convergence locally. The derivation enables unifying multiple previous MS algorithms, and opens the door to many smaller algorithmic improvements. A penalty method is introduced to strategically control the step size, further improving the convergence performance. An adaptive merit function and a more reliable acceptance condition are employed for globalization. The effects of these improvements are benchmarked for trajectory optimization with a quadrotor, an acrobot, and a manipulator. MS-DDP is also demonstrated for use in Model Predictive Control (MPC) for dynamic jumping with a quadruped robot, showing its benefits over a single shooting approach.
Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch
Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan.Z Li, Yang You
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
Authors: Rohan Choudhury, Kris Kitani, Laszlo A. Jeni
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the state-of-the-art by recurrently computing per-person 2D pose features, fusing both spatial and temporal information into a single representation. In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency. We further use this representation to track human poses over time as well as predict future poses. Finally, we demonstrate that our model is able to generalize across datasets without scene-specific fine-tuning. TEMPO achieves 10$\%$ better MPJPE with a 33$\times$ improvement in FPS compared to TesseTrack on the challenging CMU Panoptic Studio dataset.
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Authors: Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yingya Zhang, Changxin Gao, Deli Zhao, Nong Sang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, large-scale pre-trained language-image models like CLIP have shown extraordinary capabilities for understanding spatial contents, but naively transferring such models to video recognition still suffers from unsatisfactory temporal modeling capabilities. Existing methods insert tunable structures into or in parallel with the pre-trained model, which either requires back-propagation through the whole pre-trained model and is thus resource-demanding, or is limited by the temporal reasoning capability of the pre-trained structure. In this work, we present DiST, which disentangles the learning of spatial and temporal aspects of videos. Specifically, DiST uses a dual-encoder structure, where a pre-trained foundation model acts as the spatial encoder, and a lightweight network is introduced as the temporal encoder. An integration branch is inserted between the encoders to fuse spatio-temporal information. The disentangled spatial and temporal learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Meanwhile, we empirically show that disentangled learning with an extra network for integration benefits both spatial and temporal understanding. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps. When pre-training on the large-scale Kinetics-710, we achieve 89.7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST. Codes and models can be found in https://github.com/alibaba-mmai-research/DiST.
Large-Vocabulary 3D Diffusion Model with Transformer
Abstract
Creating diverse and high-quality 3D assets with an automatic generative model is highly desirable. Despite extensive efforts on 3D generation, most existing works focus on the generation of a single category or a few categories. In this paper, we introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model. Notably, there are three major challenges for this large-vocabulary 3D generation: a) the need for expressive yet efficient 3D representation; b) large diversity in geometry and texture across categories; c) complexity in the appearances of real-world objects. To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects. 1) Considering efficiency and robustness, we adopt a revised triplane representation and improve the fitting speed and accuracy. 2) To handle the drastic variations in geometry and texture, we regard the features of all 3D objects as a combination of generalized 3D knowledge and specialized 3D features. To extract generalized 3D knowledge from diverse categories, we propose a novel 3D-aware transformer with shared cross-plane attention. It learns the cross-plane relations across different planes and aggregates the generalized 3D knowledge with specialized 3D features. 3) In addition, we devise the 3D-aware encoder/decoder to enhance the generalized 3D knowledge in the encoded triplanes for handling categories with complex appearances. Extensive experiments on ShapeNet and OmniObject3D (over 200 diverse real-world categories) convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance with large diversity, rich semantics, and high quality.
Keyword: faster
Euclidean and non-Euclidean Trajectory Optimization Approaches for Quadrotor Racing
Authors: Thomas Fork, Francesco Borrelli
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
We present two approaches to compute raceline trajectories for quadrotors by solving an optimal control problem. The approaches involve expressing quadrotor pose in either a Euclidean or non-Euclidean frame of reference and are both based on collocation. The compute times of both approaches are over 100x faster than published methods. Additionally, both approaches compute trajectories with faster lap time and show improved numerical convergence. In the last part of the paper we devise a novel method to compute racelines in dense obstacle fields using the non-Euclidean approach.
$\texttt{NePhi}$: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
Authors: Lin Tian, Soumyadip Sengupta, Hastings Greer, Raúl San José Estépar, Marc Niethammer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This work proposes $\texttt{NePhi}$, a neural deformation model which results in approximately diffeomorphic transformations. In contrast to the predominant voxel-based approaches, $\texttt{NePhi}$ represents deformations functionally which allows for memory-efficient training and inference. This is of particular importance for large volumetric registrations. Further, while medical image registration approaches representing transformation maps via multi-layer perceptrons have been proposed, $\texttt{NePhi}$ facilitates both pairwise optimization-based registration $\textit{as well as}$ learning-based registration via predicted or optimized global and local latent codes. Lastly, as deformation regularity is a highly desirable property for most medical image registration tasks, $\texttt{NePhi}$ makes use of gradient inverse consistency regularization which empirically results in approximately diffeomorphic transformations. We show the performance of $\texttt{NePhi}$ on two 2D synthetic datasets as well as on real 3D lung registration. Our results show that $\texttt{NePhi}$ can achieve similar accuracies as voxel-based representations in a single-resolution registration setting while using less memory and allowing for faster instance-optimization.
Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Abstract
Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward
Abstract
Deep Reinforcement Learning has shown its capability to solve the high degrees of freedom in control and the complex interaction with the object in the multi-finger dexterous in-hand manipulation tasks. Current DRL approaches prefer sparse rewards to dense rewards for the ease of training but lack behavior constraints during the manipulation process, leading to aggressive and unstable policies that are insufficient for safety-critical in-hand manipulation tasks. Dense rewards can regulate the policy to learn stable manipulation behaviors with continuous reward constraints but are hard to empirically define and slow to converge optimally. This work proposes the Finger-specific Multi-agent Shadow Reward (FMSR) method to determine the stable manipulation constraints in the form of dense reward based on the state-action occupancy measure, a general utility of DRL that is approximated during the learning process. Information Sharing (IS) across neighboring agents enables consensus training to accelerate the convergence. The methods are evaluated in two in-hand manipulation tasks on the Shadow Hand. The results show FMSR+IS converges faster in training, achieving a higher task success rate and better manipulation stability than conventional dense reward. The comparison indicates FMSR+IS achieves a comparable success rate even with the behavior constraint but much better manipulation stability than the policy trained with a sparse reward.
Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation
Abstract
Simulation to Real-World Transfer allows affordable and fast training of learning-based robots for manipulation tasks using Deep Reinforcement Learning methods. Currently, Sim2Real uses Asymmetric Actor-Critic approaches to reduce the rich idealized features in simulation to the accessible ones in the real world. However, the feature reduction from the simulation to the real world is conducted through an empirically defined one-step curtail. Small feature reduction does not sufficiently remove the actor's features, which may still cause difficulty setting up the physical system, while large feature reduction may cause difficulty and inefficiency in training. To address this issue, we proposed Curriculum-based Sensing Reduction to enable the actor to start with the same rich feature space as the critic and then get rid of the hard-to-extract features step-by-step for higher training performance and better adaptation for real-world feature space. The reduced features are replaced with random signals from a Deep Random Generator to remove the dependency between the output and the removed features and avoid creating new dependencies. The methods are evaluated on the Allegro robot hand in a real-world in-hand manipulation task. The results show that our methods have faster training and higher task performance than baselines and can solve real-world tasks when selected tactile features are reduced.
Judging a video by its bitstream cover
Authors: Yuxing Han, Yunan Ding, Jiangtao Wen, Chen Ye Gan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially in an age where an immense volume of video content is constantly being generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream. We validate our approach using a custom-built data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our preliminary evaluations indicate precision, accuracy, and recall rates well over 80%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by six orders of magnitude.
HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods
Abstract
Talking Face Generation (TFG) aims to reconstruct facial movements to achieve high natural lip movements from audio and facial features that are under potential connections. Existing TFG methods have made significant advancements to produce natural and realistic images. However, most work rarely takes visual quality into consideration. It is challenging to ensure lip synchronization while avoiding visual quality degradation in cross-modal generation methods. To address this issue, we propose a universal High-Definition Teeth Restoration Network, dubbed HDTR-Net, for arbitrary TFG methods. HDTR-Net can enhance teeth regions at an extremely fast speed while maintaining synchronization, and temporal consistency. In particular, we propose a Fine-Grained Feature Fusion (FGFF) module to effectively capture fine texture feature information around teeth and surrounding regions, and use these features to fine-grain the feature map to enhance the clarity of teeth. Extensive experiments show that our method can be adapted to arbitrary TFG methods without suffering from lip synchronization and frame coherence. Another advantage of HDTR-Net is its real-time generation ability. Also under the condition of high-definition restoration of talking face video synthesis, its inference speed is $300\%$ faster than the current state-of-the-art face restoration based on super-resolution.
Adaptive approximation of monotone functions
Authors: Pierre Gaillard (Thoth), Sébastien Gerchinovitz (IMT), Étienne de Montbrun (TSE-R)
Abstract
We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.
Optimal Control for Indoor Vertical Farms Based on Crop Growth
Authors: Annalena Daniels, Michael Fink, Marion Leibold, Dirk Wollherr, Senthold Asseng
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
Vertical farming allows for year-round cultivation of a variety of crops, overcoming environmental limitations and ensuring food security. This closed and highly controlled system allows the plants to grow in optimal conditions, so that they reach maturity faster and yield more than on a conventional outdoor farm. However, one of the challenges of vertical farming is the high energy consumption. In this work, we optimize wheat growth using an optimal control approach with two objectives: first, we optimize inputs such as water, radiation, and temperature for each day of the growth cycle, and second, we optimize the duration of the plant's growth period to achieve the highest possible yield over a whole year. For this, we use a nonlinear, discrete-time hybrid model based on a simple universal crop model that we adapt to make the optimization more efficient. Using our approach, we find an optimal trade-off between used resources, net profit of the yield, and duration of a cropping period, thus increasing the annual yield of crops significantly while keeping input costs as low as possible. This work demonstrates the high potential of control theory in the discipline of vertical farming.
Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation
Authors: Piotr Kicki, Michał Bidziński, Krzysztof Walas
Abstract
The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and challenging task that is important in many practical applications. Classical model-based approaches to this problem require an accurate model to capture how robot motions affect the deformation of the DLO. Nowadays, data-driven models offer the best tradeoff between quality and computation time. This paper analyzes several learning-based 3D models of the DLO and proposes a new one based on the Transformer architecture that achieves superior accuracy, even on the DLOs of different lengths, thanks to the proposed scaling method. Moreover, we introduce a data augmentation technique, which improves the prediction performance of almost all considered DLO data-driven models. Thanks to this technique, even a simple Multilayer Perceptron (MLP) achieves close to state-of-the-art performance while being significantly faster to evaluate. In the experiments, we compare the performance of the learning-based 3D models of the DLO on several challenging datasets quantitatively and demonstrate their applicability in the task of shaping a DLO.
What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving
Authors: Hongkuan Zhou, Aifen Sui, Wei Cao, Letian Shi
Abstract
More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in driving pipeline, its simplicity also leads to interpretability problems and safety issues arXiv:2003.06404. The trained policy is not always compliant with the traffic rules and it is also hard to discover the reason for the misbehavior because of the lack of intermediate outputs. Meanwhile, Sensors are also critical to autonomous driving's security and feasibility to perceive the surrounding environment under complex driving scenarios. In this paper, we proposed P-CSG, a novel penalty-based imitation learning approach with cross semantics generation sensor fusion technologies to increase the overall performance of End-to-End Autonomous Driving. We conducted an assessment of our model's performance using the Town 05 Long benchmark, achieving an impressive driving score improvement of over 15%. Furthermore, we conducted robustness evaluations against adversarial attacks like FGSM and Dot attacks, revealing a substantial increase in robustness compared to baseline models.More detailed information, such as code-based resources, ablation studies and videos can be found at https://hk-zh.github.io/p-csg-plus.
Keyword: mobile
Client-side Gradient Inversion Against Federated Learning from Poisoning
Authors: Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Chao Chen, Shirui Pan, Kok-Leong Ong, Jun Zhang, Yang Xiang
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Federated Learning (FL) enables distributed participants (e.g., mobile devices) to train a global model without sharing data directly to a central server. Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples and poses high risk against the privacy of clients in FL. However, most existing GIAs necessitate control over the server and rely on strong prior knowledge including batch normalization and data distribution information. In this work, we propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients. For the first time, we show the feasibility of a client-side adversary with limited knowledge being able to recover the training samples from the aggregated global model. We take a distinct approach in which the adversary utilizes a malicious model that amplifies the loss of a specific targeted class of interest. When honest clients employ the poisoned global model, the gradients of samples belonging to the targeted class are magnified, making them the dominant factor in the aggregated update. This enables the adversary to effectively reconstruct the private input belonging to other clients using the aggregated update. In addition, our CGI also features its ability to remain stealthy against Byzantine-robust aggregation rules (AGRs). By optimizing malicious updates and blending benign updates with a malicious replacement vector, our method remains undetected by these defense mechanisms. To evaluate the performance of CGI, we conduct experiments on various benchmark datasets, considering representative Byzantine-robust AGRs, and exploring diverse FL settings with different levels of adversary knowledge about the data. Our results demonstrate that CGI consistently and successfully extracts training input in all tested scenarios.
A Tutorial on Environment-Aware Communications via Channel Knowledge Map for 6G
Authors: Yong Zeng, Junting Chen, Jie Xu, Di Wu, Xiaoli Xu, Shi Jin, Xiqi Gao, David Gesbert, Shuguang Cui, Rui Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large-dimensional channels, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links becomes quite challenging in 6G. On the other hand, there would be numerous data sources in 6G containing high-quality location-tagged channel data, making it possible to better learn the local wireless environment. By exploiting such new opportunities and for tackling the CSI acquisition challenge, there is a promising paradigm shift from the conventional environment-unaware communications to the new environment-aware communications based on the novel approach of channel knowledge map (CKM). This article aims to provide a comprehensive tutorial overview on environment-aware communications enabled by CKM to fully harness its benefits for 6G. First, the basic concept of CKM is presented, and a comparison of CKM with various existing channel inference techniques is discussed. Next, the main techniques for CKM construction are discussed, including both the model-free and model-assisted approaches. Furthermore, a general framework is presented for the utilization of CKM to achieve environment-aware communications, followed by some typical CKM-aided communication scenarios. Finally, important open problems in CKM research are highlighted and potential solutions are discussed to inspire future work.
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
Authors: Chi-en Amy Tai, Matthew Keller, Saeejith Nair, Yuhao Chen, Yifan Wu, Olivia Markham, Krish Parmar, Pengcheng Xi, Heather Keller, Sharon Kirkpatrick, Alexander Wong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.
Usability Evaluation of Spoken Humanoid Embodied Conversational Agents in Mobile Serious Games
Authors: Danai Korre, Judy Robertson
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Multimedia (cs.MM)
Abstract
This paper presents an empirical investigation of the extent to which spoken Humanoid Embodied Conversational Agents (HECAs) can foster usability in mobile serious game (MSG) applications. The aim of the research is to assess the impact of multiple agents and illusion of humanness on the quality of the interaction. The experiment investigates two styles of agent presentation: an agent of high human-likeness (HECA) and an agent of low human-likeness (text). The purpose of the experiment is to assess whether and how agents of high humanlikeness can evoke the illusion of humanness and affect usability. Agents of high human-likeness were designed by following the ECA design model that is a proposed guide for ECA development. The results of the experiment with 90 participants show that users prefer to interact with the HECAs. The difference between the two versions is statistically significant with a large effect size (d=1.01), with many of the participants justifying their choice by saying that the human-like characteristics of the HECA made the version more appealing. This research provides key information on the potential effect of HECAs on serious games, which can provide insight into the design of future mobile serious games.
Keyword: pruning
There is no result
Keyword: diffusion
Diffusion models for audio semantic communication
Authors: Eleonora Grassucci, Christian Marinoni, Andrea Rodriguez, Danilo Comminiello
Abstract
Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication approach proposes to send the semantics and then regenerate semantically consistent content at the receiver without exactly recovering the bitstream. In this paper, we propose a generative audio semantic communication framework that faces the communication problem as an inverse problem, therefore being robust to different corruptions. Our method transmits lower-dimensional representations of the audio signal and of the associated semantics to the receiver, which generates the corresponding signal with a particular focus on its meaning (i.e., the semantics) thanks to the conditional diffusion model at its core. During the generation process, the diffusion model restores the received information from multiple degradations at the same time including corruption noise and missing parts caused by the transmission over the noisy channel. We show that our framework outperforms competitors in a real-world scenario and with different channel conditions. Visit the project page to listen to samples and access the code: https://ispamm.github.io/diffusion-audio-semantic-communication/.
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
Authors: Chenghao Li, Dake Chen, Yuke Zhang, Peter A. Beerel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
While diffusion models demonstrate a remarkable capability for generating high-quality images, their tendency to `replicate' training data raises privacy concerns. Although recent research suggests that this replication may stem from the insufficient generalization of training data captions and duplication of training images, effective mitigation strategies remain elusive. To address this gap, our paper first introduces a generality score that measures the caption generality and employ large language model (LLM) to generalize training captions. Subsequently, we leverage generalized captions and propose a novel dual fusion enhancement approach to mitigate the replication of diffusion models. Our empirical results demonstrate that our proposed methods can significantly reduce replication by 43.5% compared to the original diffusion model while maintaining the diversity and quality of generations.
Unbiased Face Synthesis With Diffusion Models: Are We There Yet?
Authors: Harrison Rosenberg, Shimaa Ahmed, Guruprasad V Ramesh, Ramya Korlakai Vinayak, Kassem Fawaz
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
AudioSR: Versatile Audio Super-resolution at Scale
Authors: Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Abstract
Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.
A posteriori error analysis of a positivity preserving scheme for the power-law diffusion Keller-Segel model
Abstract
We study a finite volume scheme approximating a parabolic-elliptic Keller-Segel system with power law diffusion with exponent $\gamma \in [1,3]$ and periodic boundary conditions. We derive conditional a posteriori bounds for the error measured in the $L^\infty(0,T;H^1(\Omega))$ norm for the chemoattractant and by a quasi-norm-like quantity for the density. These results are based on stability estimates and suitable conforming reconstructions of the numerical solution. We perform numerical experiments showing that our error bounds are linear in mesh width and elucidating the behaviour of the error estimator under changes of $\gamma$.
Beta quantile regression for robust estimation of uncertainty in the presence of outliers
Abstract
Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.
Semantic Adversarial Attacks via Diffusion Models
Authors: Chenan Wang, Jinhao Duan, Chaowei Xiao, Edward Kim, Matthew Stamm, Kaidi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models since semantic information is included in the latent space of well-trained diffusion models. Then there are two variants of this framework: 1) the Semantic Transformation (ST) approach fine-tunes the latent space of the generated image and/or the diffusion model itself; 2) the Latent Masking (LM) approach masks the latent space with another target image and local backpropagation-based interpretation methods. Additionally, the ST approach can be applied in either white-box or black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ datasets, and our framework demonstrates great fidelity, generalizability, and transferability compared to other baselines. Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61. Code is available at https://github.com/steven202/semantic_adv_via_dm.
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
Abstract
A key challenge with procedure planning in instructional videos lies in how to handle a large decision space consisting of a multitude of action types that belong to various tasks. To understand real-world video content, an AI agent must proficiently discern these action types (e.g., pour milk, pour water, open lid, close lid, etc.) based on brief visual observation. Moreover, it must adeptly capture the intricate semantic relation of the action types and task goals, along with the variable action sequences. Recently, notable progress has been made via the integration of diffusion models and visual representation learning to address the challenge. However, existing models employ rudimentary mechanisms to utilize task information to manage the decision space. To overcome this limitation, we introduce a simple yet effective enhancement - a masked diffusion model. The introduced mask acts akin to a task-oriented attention filter, enabling the diffusion/denoising process to concentrate on a subset of action types. Furthermore, to bolster the accuracy of task classification, we harness more potent visual representation learning techniques. In particular, we learn a joint visual-text embedding, where a text embedding is generated by prompting a pre-trained vision-language model to focus on human actions. We evaluate the method on three public datasets and achieve state-of-the-art performance on multiple metrics. Code is available at https://github.com/ffzzy840304/Masked-PDPP.
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Authors: Zipeng Qi, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generating realistic talking faces is a complex and widely discussed task with numerous applications. In this paper, we present DiffTalker, a novel model designed to generate lifelike talking faces through audio and landmark co-driving. DiffTalker addresses the challenges associated with directly applying diffusion models to audio control, which are traditionally trained on text-image pairs. DiffTalker consists of two agent networks: a transformer-based landmarks completion network for geometric accuracy and a diffusion-based face generation network for texture details. Landmarks play a pivotal role in establishing a seamless connection between the audio and image domains, facilitating the incorporation of knowledge from pre-trained diffusion models. This innovative approach efficiently produces articulate-speaking faces. Experimental results showcase DiffTalker's superior performance in producing clear and geometrically accurate talking faces, all without the need for additional alignment between audio and image features.
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Authors: Michael Kartmann, Tim Keil, Mario Ohlberger, Stefan Volkwein, Barbara Kaltenbacher
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Abstract
In this contribution, we are concerned with model order reduction in the context of iterative regularization methods for the solution of inverse problems arising from parameter identification in elliptic partial differential equations. Such methods typically require a large number of forward solutions, which makes the use of the reduced basis method attractive to reduce computational complexity. However, the considered inverse problems are typically ill-posed due to their infinite-dimensional parameter space. Moreover, the infinite-dimensional parameter space makes it impossible to build and certify classical reduced-order models efficiently in a so-called "offline phase". We thus propose a new algorithm that adaptively builds a reduced parameter space in the online phase. The enrichment of the reduced parameter space is naturally inherited from the Tikhonov regularization within an iteratively regularized Gau{\ss}-Newton method. Finally, the adaptive parameter space reduction is combined with a certified reduced basis state space reduction within an adaptive error-aware trust region framework. Numerical experiments are presented to show the efficiency of the combined parameter and state space reduction for inverse parameter identification problems with distributed reaction or diffusion coefficients.
Abstract
We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.
Generative Image Dynamics
Authors: Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present an approach to modeling an image-space prior on scene dynamics. Our prior is learned from a collection of motion trajectories extracted from real video sequences containing natural, oscillating motion such as trees, flowers, candles, and clothes blowing in the wind. Given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a per-pixel long-term motion representation in the Fourier domain, which we call a neural stochastic motion texture. This representation can be converted into dense motion trajectories that span an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping dynamic videos, or allowing users to realistically interact with objects in real pictures.
Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch
Authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan.Z Li, Yang You
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
Large-Vocabulary 3D Diffusion Model with Transformer
Abstract
Creating diverse and high-quality 3D assets with an automatic generative model is highly desirable. Despite extensive efforts on 3D generation, most existing works focus on the generation of a single category or a few categories. In this paper, we introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model. Notably, there are three major challenges for this large-vocabulary 3D generation: a) the need for expressive yet efficient 3D representation; b) large diversity in geometry and texture across categories; c) complexity in the appearances of real-world objects. To this end, we propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects. 1) Considering efficiency and robustness, we adopt a revised triplane representation and improve the fitting speed and accuracy. 2) To handle the drastic variations in geometry and texture, we regard the features of all 3D objects as a combination of generalized 3D knowledge and specialized 3D features. To extract generalized 3D knowledge from diverse categories, we propose a novel 3D-aware transformer with shared cross-plane attention. It learns the cross-plane relations across different planes and aggregates the generalized 3D knowledge with specialized 3D features. 3) In addition, we devise the 3D-aware encoder/decoder to enhance the generalized 3D knowledge in the encoded triplanes for handling categories with complex appearances. Extensive experiments on ShapeNet and OmniObject3D (over 200 diverse real-world categories) convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance with large diversity, rich semantics, and high quality.
Keyword: adaptive
Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition
Abstract
Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.
Small error algorithms for tropical group testing
Authors: Vivekanand Paligadu, Oliver Johnson, Matthew Aldridge
Subjects: Information Theory (cs.IT); Probability (math.PR)
Abstract
We consider a version of the classical group testing problem motivated by PCR testing for COVID-19. In the so-called tropical group testing model, the outcome of a test is the lowest cycle threshold (Ct) level of the individuals pooled within it, rather than a simple binary indicator variable. We introduce the tropical counterparts of three classical non-adaptive algorithms (COMP, DD and SCOMP), and analyse their behaviour through both simulations and bounds on error probabilities. By comparing the results of the tropical and classical algorithms, we gain insight into the extra information provided by learning the outcomes (Ct levels) of the tests. We show that in a limiting regime the tropical COMP algorithm requires as many tests as its classical counterpart, but that for sufficiently dense problems tropical DD can recover more information with fewer tests, and can be viewed as essentially optimal in certain regimes.
Domain-adaptive Graph Attention-supervised Network for Cross-network Edge Classification
Authors: Xiao Shen, Mengqiu Shao, Shirui Pan, Laurence T. Yang, Xi Zhou
Abstract
Graph neural networks (GNNs) have shown great ability in modeling graphs, however, their performance would significantly degrade when there are noisy edges connecting nodes from different classes. To alleviate negative effect of noisy edges on neighborhood aggregation, some recent GNNs propose to predict the label agreement between node pairs within a single network. However, predicting the label agreement of edges across different networks has not been investigated yet. Our work makes the pioneering attempt to study a novel problem of cross-network homophilous and heterophilous edge classification (CNHHEC), and proposes a novel domain-adaptive graph attention-supervised network (DGASN) to effectively tackle the CNHHEC problem. Firstly, DGASN adopts multi-head GAT as the GNN encoder, which jointly trains node embeddings and edge embeddings via the node classification and edge classification losses. As a result, label-discriminative embeddings can be obtained to distinguish homophilous edges from heterophilous edges. In addition, DGASN applies direct supervision on graph attention learning based on the observed edge labels from the source network, thus lowering the negative effects of heterophilous edges while enlarging the positive effects of homophilous edges during neighborhood aggregation. To facilitate knowledge transfer across networks, DGASN employs adversarial domain adaptation to mitigate domain divergence. Extensive experiments on real-world benchmark datasets demonstrate that the proposed DGASN achieves the state-of-the-art performance in CNHHEC.
Deep Reinforcement Learning-based Scheduling in Edge and Fog Computing Environments
Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negatively affect IoT applications' response time. Moreover, many IoT applications are composed of dependent components incurring extra constraints for their execution. Besides, edge/fog computing environments and IoT applications are inherently dynamic and stochastic. Thus, efficient and adaptive scheduling of IoT applications in heterogeneous edge/fog computing environments is of paramount importance. However, limited computational resources on edge/fog servers imposes an extra burden for applying optimal but computationally demanding techniques. To overcome these challenges, we propose a Deep Reinforcement Learning-based IoT application Scheduling algorithm, called DRLIS to adaptively and efficiently optimize the response time of heterogeneous IoT applications and balance the load of the edge/fog servers. We implemented DRLIS as a practical scheduler in the FogBus2 function-as-a-service framework for creating an edge-fog-cloud integrated serverless computing environment. Results obtained from extensive experiments show that DRLIS significantly reduces the execution cost of IoT applications by up to 55%, 37%, and 50% in terms of load balancing, response time, and weighted cost, respectively, compared with metaheuristic algorithms and other reinforcement learning techniques.
JSMNet Improving Indoor Point Cloud Semantic and Instance Segmentation through Self-Attention and Multiscale
Authors: Shuochen Xu, Zhenxin Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
The semantic understanding of indoor 3D point cloud data is crucial for a range of subsequent applications, including indoor service robots, navigation systems, and digital twin engineering. Global features are crucial for achieving high-quality semantic and instance segmentation of indoor point clouds, as they provide essential long-range context information. To this end, we propose JSMNet, which combines a multi-layer network with a global feature self-attention module to jointly segment three-dimensional point cloud semantics and instances. To better express the characteristics of indoor targets, we have designed a multi-resolution feature adaptive fusion module that takes into account the differences in point cloud density caused by varying scanner distances from the target. Additionally, we propose a framework for joint semantic and instance segmentation by integrating semantic and instance features to achieve superior results. We conduct experiments on S3DIS, which is a large three-dimensional indoor point cloud dataset. Our proposed method is compared against other methods, and the results show that it outperforms existing methods in semantic and instance segmentation and provides better results in target local area segmentation. Specifically, our proposed method outperforms PointNet (Qi et al., 2017a) by 16.0% and 26.3% in terms of semantic segmentation mIoU in S3DIS (Area 5) and instance segmentation mPre, respectively. Additionally, it surpasses ASIS (Wang et al., 2019) by 6.0% and 4.6%, respectively, as well as JSPNet (Chen et al., 2022) by a margin of 3.3% for semantic segmentation mIoU and a slight improvement of 0.3% for instance segmentation mPre.
A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization
Abstract
This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desired control input that stabilizes the first $2^{nd}$-order sub-system. Solving the equation by considering the coupling terms as design variables, the reference signal is generated for the second sub-system. Then through simple compensation design, together with the second PD controller, the cascaded PD controller is derived. In order to further improve the performance, two FLRs are implemented that adaptively tune the parameters of PD controllers. Under natural assumptions, the cascaded fuzzy PD controller is proved to possess locally asymptotic stability. All the offline tuning processes are completed data-efficiently by Bayesian Optimization. The results in simulation illustrate the stability and validity of our proposed method. Besides, the idea of cascaded PD controller presented here may be extended as a novel control method for other under-actuated systems, and the stability analysis renders a new perspective towards the stability proof of all other fuzzy-enhanced PID controllers.
A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing
Abstract
Remote sensing images are essential for many earth science applications, but their quality can be degraded due to limitations in sensor technology and complex imaging environments. To address this, various remote sensing image deblurring methods have been developed to restore sharp, high-quality images from degraded observational data. However, most traditional model-based deblurring methods usually require predefined hand-craft prior assumptions, which are difficult to handle in complex applications, and most deep learning-based deblurring methods are designed as a black box, lacking transparency and interpretability. In this work, we propose a novel blind deblurring learning framework based on alternating iterations of shrinkage thresholds, alternately updating blurring kernels and images, with the theoretical foundation of network design. Additionally, we propose a learnable blur kernel proximal mapping module to improve the blur kernel evaluation in the kernel domain. Then, we proposed a deep proximal mapping module in the image domain, which combines a generalized shrinkage threshold operator and a multi-scale prior feature extraction block. This module also introduces an attention mechanism to adaptively adjust the prior importance, thus avoiding the drawbacks of hand-crafted image prior terms. Thus, a novel multi-scale generalized shrinkage threshold network (MGSTNet) is designed to specifically focus on learning deep geometric prior features to enhance image restoration. Experiments demonstrate the superiority of our MGSTNet framework on remote sensing image datasets compared to existing deblurring methods.
Adaptive approximation of monotone functions
Authors: Pierre Gaillard (Thoth), Sébastien Gerchinovitz (IMT), Étienne de Montbrun (TSE-R)
Abstract
We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Authors: Michael Kartmann, Tim Keil, Mario Ohlberger, Stefan Volkwein, Barbara Kaltenbacher
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Abstract
In this contribution, we are concerned with model order reduction in the context of iterative regularization methods for the solution of inverse problems arising from parameter identification in elliptic partial differential equations. Such methods typically require a large number of forward solutions, which makes the use of the reduced basis method attractive to reduce computational complexity. However, the considered inverse problems are typically ill-posed due to their infinite-dimensional parameter space. Moreover, the infinite-dimensional parameter space makes it impossible to build and certify classical reduced-order models efficiently in a so-called "offline phase". We thus propose a new algorithm that adaptively builds a reduced parameter space in the online phase. The enrichment of the reduced parameter space is naturally inherited from the Tikhonov regularization within an iteratively regularized Gau{\ss}-Newton method. Finally, the adaptive parameter space reduction is combined with a certified reduced basis state space reduction within an adaptive error-aware trust region framework. Numerical experiments are presented to show the efficiency of the combined parameter and state space reduction for inverse parameter identification problems with distributed reaction or diffusion coefficients.
Frequency-adaptive control of a three-phase single-stage grid-connected photovoltaic system under grid voltage sags
Authors: Alexis B. Rey-Boué, N.F. Guerrero-Rodríguez, Johannes Stöckl, Thomas I. Strasser
Abstract
The low-voltage ride-through service is carried out in this paper according to the voltage profile described by the IEC 61400-21 European normative when short-duration voltage sags happen, and some instantaneous reactive power is delivered to the grid in accordance with the Spanish grid code; the mandatory limitation of the amplitude of the three-phase inverter currents to its nominal value is carried out with a novel control strategy, in which a certain amount of instantaneous constant active power can also be delivered to the grid when small or moderate voltage sags happen. A Multiple second order generalized integrator frequency-locked loop synchronization algorithm is employed in order to estimate the system frequency without harmonic distortions, as well as to output the positive- and the negative- sequence of the {\alpha}\b{eta} quantities of the three-phase grid voltages when balanced and unbalanced voltage sags happen in a frequency-adaptive scheme. The current control is carried out in the stationary reference frame, which guarantees the cancellation of the harmonic distortions in the utility grid currents using a Harmonic compensation structure, and the implementation of a constant active power control in order to protect the DC link capacitor from thermal stresses avoiding the appearance of large harmonic distortions at twice the fundamental frequency in the DC link voltage. A case study of a three-phase single-stage grid-connected PV system with a maximum apparent power about 500 kVA is tested with several simulations using MATLAB/SIMULINK firstly, and secondly, with some experiments using the Controller hardware-in-the-loop (CHIL) simulation technique for several types of voltage sags in order to do the final validation of the control algorithms.
Aerial Manipulator Force Control Using Control Barrier Functions
Authors: Dimitris Chaikalis, Vinicius Goncalves, Anthony Tzes, Farshad Khorrami
Abstract
This article studies the problem of applying normal forces on a surface, using an underactuated aerial vehicle equipped with a dexterous robotic arm. A force-motion high-level controller is designed based on a Lyapunov function encompassing alignment and exerted force errors. This controller is coupled with a Control Barrier Function constraint under an optimization scheme using Quadratic Programming. This aims to enforce a prescribed relationship between the approaching motion for the end-effector and its alignment with the surface, thus ensuring safe operation. An adaptive low-level controller is devised for the aerial vehicle, capable of tracking velocity commands generated by the high-level controller. Simulations are presented to demonstrate the force exertion stability and safety of the controller in cases of large disturbances.
Imitation Learning-based Visual Servoing for Tracking Moving Objects
Authors: Rocco Felici, Matteo Saveriano, Loris Roveda, Antonio Paolillo
Abstract
In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of visual servoing and imitation learning allows us to pursue the objective of realizing friendly robotic interfaces that (i) are able to adapt to the environment thanks to the use of visual perception and (ii) avoid explicit programming thanks to the emulation of previous demonstrations. This work aims to exploit imitation learning for the visual servoing paradigm to address the specific problem of tracking moving objects. In particular, we show that it is possible to infer from data the compensation term required for realizing the tracking controller, avoiding the explicit implementation of estimators or observers. The effectiveness of the proposed method has been validated through simulations with a robotic manipulator.
AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks
Authors: Soumadeep Das, Aryan Mohammadi Pasikhani, Prosanta Gope, John A. Clark, Chintan Patel, Biplab Sikdar
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Underwater Acoustic Sensor Networks (UW-ASNs) are predominantly used for underwater environments and find applications in many areas. However, a lack of security considerations, the unstable and challenging nature of the underwater environment, and the resource-constrained nature of the sensor nodes used for UW-ASNs (which makes them incapable of adopting security primitives) make the UW-ASN prone to vulnerabilities. This paper proposes an Adaptive decentralised Intrusion Detection and Prevention System called AIDPS for UW-ASNs. The proposed AIDPS can improve the security of the UW-ASNs so that they can efficiently detect underwater-related attacks (e.g., blackhole, grayhole and flooding attacks). To determine the most effective configuration of the proposed construction, we conduct a number of experiments using several state-of-the-art machine learning algorithms (e.g., Adaptive Random Forest (ARF), light gradient-boosting machine, and K-nearest neighbours) and concept drift detection algorithms (e.g., ADWIN, kdqTree, and Page-Hinkley). Our experimental results show that incremental ARF using ADWIN provides optimal performance when implemented with One-class support vector machine (SVM) anomaly-based detectors. Furthermore, our extensive evaluation results also show that the proposed scheme outperforms state-of-the-art bench-marking methods while providing a wider range of desirable features such as scalability and complexity.
A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization
Abstract
Artificial neural networks (ANNs), inspired by the interconnection of real neurons, have achieved unprecedented success in various fields such as computer vision and natural language processing. Recently, a novel mathematical ANN model, known as the dendritic neuron model (DNM), has been proposed to address nonlinear problems by more accurately reflecting the structure of real neurons. However, the single-output design limits its capability to handle multi-output tasks, significantly lowering its applications. In this paper, we propose a novel multi-in and multi-out dendritic neuron model (MODN) to tackle multi-output tasks. Our core idea is to introduce a filtering matrix to the soma layer to adaptively select the desired dendrites to regress each output. Because such a matrix is designed to be learnable, MODN can explore the relationship between each dendrite and output to provide a better solution to downstream tasks. We also model a telodendron layer into MODN to simulate better the real neuron behavior. Importantly, MODN is a more general and unified framework that can be naturally specialized as the DNM by customizing the filtering matrix. To explore the optimization of MODN, we investigate both heuristic and gradient-based optimizers and introduce a 2-step training method for MODN. Extensive experimental results performed on 11 datasets on both binary and multi-class classification tasks demonstrate the effectiveness of MODN, with respect to accuracy, convergence, and generality.
VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
Abstract
We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using Offline Reinforcement Learning (RL). Our method trains a novel RL policy from unlabeled data collected in real outdoor vegetation. This policy uses height and intensity-based cost maps derived from 3D LiDAR point clouds, a goal cost map, and processed proprioception data as state inputs, and learns the physical and geometric properties of the surrounding vegetation such as height, density, and solidity/stiffness for navigation. Instead of using end-to-end policy actions, the fully-trained RL policy's Q network is used to evaluate dynamically feasible robot actions generated from a novel adaptive planner capable of navigating through dense narrow passages and preventing entrapment in vegetation such as tall grass and bushes. We demonstrate our method's capabilities on a legged robot in complex outdoor vegetation. We observe an improvement in success rates, a decrease in average power consumption, and decrease in normalized trajectory length compared to both existing end-to-end offline RL and outdoor navigation methods.
A Unified Perspective on Multiple Shooting In Differential Dynamic Programming
Authors: He Li, Wenhao Yu, Tingnan Zhang, Patrick M. Wensing
Abstract
Differential Dynamic Programming (DDP) is an efficient computational tool for solving nonlinear optimal control problems. It was originally designed as a single shooting method and thus is sensitive to the initial guess supplied. This work considers the extension of DDP to multiple shooting (MS), improving its robustness to initial guesses. A novel derivation is proposed that accounts for the defect between shooting segments during the DDP backward pass, while still maintaining quadratic convergence locally. The derivation enables unifying multiple previous MS algorithms, and opens the door to many smaller algorithmic improvements. A penalty method is introduced to strategically control the step size, further improving the convergence performance. An adaptive merit function and a more reliable acceptance condition are employed for globalization. The effects of these improvements are benchmarked for trajectory optimization with a quadrotor, an acrobot, and a manipulator. MS-DDP is also demonstrated for use in Model Predictive Control (MPC) for dynamic jumping with a quadruped robot, showing its benefits over a single shooting approach.
Keyword: quantization
Communication Efficient Private Federated Learning Using Dithering
Authors: Burak Hasircioglu, Deniz Gunduz
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.
Keyword: efficient
Bringing PDEs to JAX with forward and reverse modes automatic differentiation
VertiSync: A Traffic Management Policy with Maximum Throughput for On-Demand Urban Air Mobility Networks
Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis
Digital 3D Smocking Design
Geometric Gait Optimization for Inertia-Dominated Systems With Nonzero Net Momentum
Language-Conditioned Observation Models for Visual Object Search
$\texttt{NePhi}$: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Energy-Constrained Active Exploration Under Incremental-Resolution Symbolic Perception
International Competition on Graph Counting Algorithms 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective
Multi-Grade Deep Learning for Partial Differential Equations with Applications to the Burgers Equation
Deep Reinforcement Learning-based Scheduling in Edge and Fog Computing Environments
M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec
Learning Tube-Certified Control Using Robust Contraction Metrics
Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects
A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization
Comparison of DDS, MQTT, and Zenoh in Edge-to-Edge/Cloud Communication with ROS 2
Efficiently Robustify Pre-trained Models
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning
A Gaussian Copula Approach to the Performance Analysis of Fluid Antenna Systems
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Universality of underlying mechanism for successful deep learning
Optimal Control for Indoor Vertical Farms Based on Crop Growth
Achieving 45% efficiency of CIGS/CdS Solar Cell by adding GaAs using optimization techniques
A Survey of Graph Pre-processing Methods: From Algorithmic to Hardware Perspectives
Fluid Antenna-Assisted Dirty Multiple Access Channels over Composite Fading
On Annihilators of Explicit Polynomial Maps
Influence Robustness of Nodes in Multiplex Networks against Attacks
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Impact of Excitation and Weighting Errors on Performance of Compact OTA Testing Systems
WASM-MUTATE: Fast and Effective Binary Diversification for WebAssembly
Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation
Physics-constrained robust learning of open-form PDEs from limited and noisy data
Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis
AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks
Dynamic programming on bipartite tree decompositions
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Communication Efficient Private Federated Learning Using Dithering
Decomposition of linear tensor transformations
MC-NeRF: Muti-Camera Neural Radiance Fields for Muti-Camera Image Acquisition Systems
SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions
A Unified Perspective on Multiple Shooting In Differential Dynamic Programming
Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch
TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Large-Vocabulary 3D Diffusion Model with Transformer
Keyword: faster
Euclidean and non-Euclidean Trajectory Optimization Approaches for Quadrotor Racing
$\texttt{NePhi}$: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains
Stable In-hand Manipulation with Finger Specific Multi-agent Shadow Reward
Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation
Judging a video by its bitstream cover
HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods
Adaptive approximation of monotone functions
Optimal Control for Indoor Vertical Farms Based on Crop Growth
Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation
What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving
Keyword: mobile
Client-side Gradient Inversion Against Federated Learning from Poisoning
A Tutorial on Environment-Aware Communications via Channel Knowledge Map for 6G
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
Usability Evaluation of Spoken Humanoid Embodied Conversational Agents in Mobile Serious Games
Keyword: pruning
There is no result
Keyword: diffusion
Diffusion models for audio semantic communication
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
Unbiased Face Synthesis With Diffusion Models: Are We There Yet?
AudioSR: Versatile Audio Super-resolution at Scale
A posteriori error analysis of a positivity preserving scheme for the power-law diffusion Keller-Segel model
Beta quantile regression for robust estimation of uncertainty in the presence of outliers
Semantic Adversarial Attacks via Diffusion Models
Masked Diffusion with Task-awareness for Procedure Planning in Instructional Videos
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Beta Diffusion
Generative Image Dynamics
Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch
Large-Vocabulary 3D Diffusion Model with Transformer
Keyword: adaptive
Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition
Small error algorithms for tropical group testing
Domain-adaptive Graph Attention-supervised Network for Cross-network Edge Classification
Deep Reinforcement Learning-based Scheduling in Edge and Fog Computing Environments
JSMNet Improving Indoor Point Cloud Semantic and Instance Segmentation through Self-Attention and Multiscale
A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization
A Multi-scale Generalized Shrinkage Threshold Network for Image Blind Deblurring in Remote Sensing
Adaptive approximation of monotone functions
Adaptive Reduced Basis Trust Region Methods for Parameter Identification Problems
Frequency-adaptive control of a three-phase single-stage grid-connected photovoltaic system under grid voltage sags
Aerial Manipulator Force Control Using Control Barrier Functions
Imitation Learning-based Visual Servoing for Tracking Moving Objects
AIDPS:Adaptive Intrusion Detection and Prevention System for Underwater Acoustic Sensor Networks
A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization
VAPOR: Holonomic Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
A Unified Perspective on Multiple Shooting In Differential Dynamic Programming
Keyword: quantization
Communication Efficient Private Federated Learning Using Dithering