Abstract
Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition softwares/hardwares are providing vast amounts of data for various complex phenomena, ranging from those in atomospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse high-dimensional, multiscale and multiphysics phenomena that contain elements of stochasticity, and to generate large volumes of numerical data for them in heterogeneous systems. The difficulty is, however, that often the governing equations for such phenomena are not known. A prime example is flow, transport, and deformation processes in macroscopically-heterogeneous materials and geomedia. In other cases, the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on data, or that they require constitutive relations, such as the relationship between the stress tensor and the velocity gradients for non-Newtonian fluids in the momentum conservation equation, in order for them to be useful to the modeling. Several classes of approaches are emerging to address such problems that are based on machine learning, symbolic regression, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation, and stochastic optimization and analysis, or a combination of two or more of such approaches. This Perspective describes the latest developments in this highly important area, and discusses possible future directions.
Scalable Data Point Valuation in Decentralized Learning
Authors: Konstantin D. Pandl, Chun-Yin Huang, Ivan Beschastnikh, Xiaoxiao Li, Scott Thiebes, Ali Sunyaev
Abstract
Existing research on data valuation in federated and swarm learning focuses on valuing client contributions and works best when data across clients is independent and identically distributed (IID). In practice, data is rarely distributed IID. We develop an approach called DDVal for decentralized data valuation, capable of valuing individual data points in federated and swarm learning. DDVal is based on sharing deep features and approximating Shapley values through a k-nearest neighbor approximation method. This allows for novel applications, for example, to simultaneously reward institutions and individuals for providing data to a decentralized machine learning task. The valuation of data points through DDVal allows to also draw hierarchical conclusions on the contribution of institutions, and we empirically show that the accuracy of DDVal in estimating institutional contributions is higher than existing Shapley value approximation methods for federated learning. Specifically, it reaches a cosine similarity in approximating Shapley values of 99.969 % in both, IID and non-IID data distributions across institutions, compared with 99.301 % and 97.250 % for the best state of the art methods. DDVal scales with the number of data points instead of the number of clients, and has a loglinear complexity. This scales more favorably than existing approaches with an exponential complexity. We show that DDVal is especially efficient in data distribution scenarios with many clients that have few data points - for example, more than 16 clients with 8,000 data points each. By integrating DDVal into a decentralized system, we show that it is not only suitable for centralized federated learning, but also decentralized swarm learning, which aligns well with the research on emerging internet technologies such as web3 to reward users for providing data to algorithms.
FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework
Authors: Dongyue Guo, Zheng Zhang, Jianwei Zhang, Yi Lin
Abstract
Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers to manage airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, which is prone to suffer from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improved the limitation of the binary encoding (BE) representation in the FlightBERT framework. Specifically, the proposed framework is implemented by a generalized Encoder-Decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future time steps. Compared to conventional architecture, an extra horizon-aware contexts generator (HACG) is dedicatedly designed to consider the prior horizon information that enables us to perform multi-horizon non-autoregressive prediction. Additionally, a differential prediction strategy is designed by well considering both the stationarity of the differential sequence and the high-bits errors of the BE representation. Moreover, the Bit-wise Weighted Binary Cross Entropy loss function is proposed to optimize the proposed framework that can further constrain the high-bits errors of the predictions. Finally, the proposed framework is validated on a real-world flight trajectory dataset. The experimental results show that the proposed framework outperformed the competitive baselines.
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
Abstract
The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).
Stars Are All You Need: A Distantly Supervised Pyramid Network for Document-Level End-to-End Sentiment Analysis
Abstract
In this paper, we propose document-level end-to-end sentiment analysis to efficiently understand aspect and review sentiment expressed in online reviews in a unified manner. In particular, we assume that star rating labels are a "coarse-grained synthesis" of aspect ratings across in the review. We propose a Distantly Supervised Pyramid Network (DSPN) to efficiently perform Aspect-Category Detection, Aspect-Category Sentiment Analysis, and Rating Prediction using only document star rating labels for training. By performing these three related sentiment subtasks in an end-to-end manner, DSPN can extract aspects mentioned in the review, identify the corresponding sentiments, and predict the star rating labels. We evaluate DSPN on multi-aspect review datasets in English and Chinese and find that with only star rating labels for supervision, DSPN can perform comparably well to a variety of benchmark models. We also demonstrate the interpretability of DSPN's outputs on reviews to show the pyramid structure inherent in document level end-to-end sentiment analysis.
Cross-view Action Recognition via Contrastive View-invariant Representation
Authors: Yuexi Zhang, Dan Luo, Balaji Sundareshan, Octavia Camps, Mario Sznaier
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Cross view action recognition (CVAR) seeks to recognize a human action when observed from a previously unseen viewpoint. This is a challenging problem since the appearance of an action changes significantly with the viewpoint. Applications of CVAR include surveillance and monitoring of assisted living facilities where is not practical or feasible to collect large amounts of training data when adding a new camera. We present a simple yet efficient CVAR framework to learn invariant features from either RGB videos, 3D skeleton data, or both. The proposed approach outperforms the current state-of-the-art achieving similar levels of performance across input modalities: 99.4% (RGB) and 99.9% (3D skeletons), 99.4% (RGB) and 99.9% (3D Skeletons), 97.3% (RGB), and 99.2% (3D skeletons), and 84.4%(RGB) for the N-UCLA, NTU-RGB+D 60, NTU-RGB+D 120, and UWA3DII datasets, respectively.
Connectivity Queries under Vertex Failures: Not Optimal, but Practical
Abstract
We revisit once more the problem of designing an oracle for answering connectivity queries in undirected graphs in the presence of vertex failures. Specifically, given an undirected graph $G$ with $n$ vertices and $m$ edges and an integer $d{\star}\ll n$, the goal is to preprocess the graph in order to construct a data structure $\mathcal{D}$ such that, given a set of vertices $F$ with $|F|=d\leq d{\star}$, we can derive an oracle from $\mathcal{D}$ that can efficiently answer queries of the form "is $x$ connected with $y$ in $G\setminus F$?". Very recently, Long and Saranurak (FOCS 2022) provided a solution to this problem that is almost optimal with respect to the preprocessing time, the space usage, the update time, and the query time. However, their solution is highly complicated, and it seems very difficult to be implemented efficiently. Furthermore, it does not settle the complexity of the problem in the regime where $d{\star}$ is a constant. Here, we provide a much simpler solution to this problem, that uses only textbook data structures. Our algorithm is deterministic, it has preprocessing time and space complexity $O(d{\star}m\log n)$, update time $O(d^4 \log n)$, and query time $O(d)$. These bounds compare very well with the previous best, especially considering the simplicity of our approach. In fact, if we assume that $d{\star}$ is a constant ($d{\star}\geq 4$), then our algorithm improves on the state-of-the-art in every respect, except space. Nevertheless, even our space usage in this case is almost linear. Finally, the data structure that we provide is flexible with respect to $d{\star}$: it can be adapted to increases and decreases, in time and space that are almost proportional to the change in $d{\star}$ and the size of the graph.
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
Abstract
Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.
Fairly Allocating Goods and (Terrible) Chores
Authors: Hadi Hosseini, Aghaheybat Mammadov, Tomasz Wąs
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
We study the fair allocation of mixtures of indivisible goods and chores under lexicographic preferences$\unicode{x2014}$a subdomain of additive preferences. A prominent fairness notion for allocating indivisible items is envy-freeness up to any item (EFX). Yet, its existence and computation has remained a notable open problem. By identifying a class of instances with "terrible chores", we show that determining the existence of an EFX allocation is NP-complete. This result immediately implies the intractability of EFX under additive preferences. Nonetheless, we propose a natural subclass of lexicographic preferences for which an EFX and Pareto optimal (PO) allocation is guaranteed to exist and can be computed efficiently for any mixed instance. Focusing on two weaker fairness notions, we investigate finding EF1 and PO allocations for special instances with terrible chores, and show that MMS and PO allocations can be computed efficiently for any mixed instance with lexicographic preferences.
Characterizing Compositionality of LQR from the Categorical Perspective
Authors: Baike She, Tyler Hanks, James Fairbanks, Matthew Hale
Abstract
Composing systems is a fundamental concept in modern control systems, yet it remains challenging to formally analyze how controllers designed for individual subsystems can differ from controllers designed for the composition of those subsystems. To address this challenge, we propose a novel approach to composing control systems based on resource sharing machines, a concept from applied category theory. We use resource sharing machines to investigate the differences between (i) the linear-quadratic regulator (LQR) designed directly for a composite system and (ii) the LQR that is attained through the composition of LQRs designed for each subsystem. We first establish novel formalisms to compose LQR control designs using resource sharing machines. Then we develop new sufficient conditions to guarantee that the LQR designed for a composite system is equal to the LQR attained through composition of LQRs for its subsystems. In addition, we reduce the developed condition to that of checking the controllability and observability of a certain linear, time-invariant system, which provides a simple, computationally efficient procedure for evaluating the equivalence of controllers for composed systems.
Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems
Abstract
As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic design strategies to simultaneously optimize for carbon, performance, power and energy. In this work, we take a data-driven approach to characterize the carbon impact (quantified in units of CO2e) of various artificial intelligence (AI) and extended reality (XR) production-level hardware and application use-cases. We propose a holistic design exploration framework to optimize and design for carbon-efficient computing systems and hardware. Our frameworks identifies significant opportunities for carbon efficiency improvements in application-specific and general purpose hardware design and optimization. Using our framework, we demonstrate 10$\times$ carbon efficiency improvement for specialized AI and XR accelerators (quantified by a key metric, tCDP: the product of total CO2e and total application execution time), up to 21% total life cycle carbon savings for existing general-purpose hardware and applications due to hardware over-provisioning, and up to 7.86$\times$ carbon efficiency improvement using advanced 3D integration techniques for resource-constrained XR systems.
Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach
Authors: Junjie Ye, Jilin Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
In this study, we explore the potential of using a straightforward neural network inspired by the retina model to efficiently restore low-light images. The retina model imitates the neurophysiological principles and dynamics of various optical neurons. Our proposed neural network model reduces the computational overhead compared to traditional signal-processing models while achieving results similar to complex deep learning models from a subjective perceptual perspective. By directly simulating retinal neuron functionalities with neural networks, we not only avoid manual parameter optimization but also lay the groundwork for constructing artificial versions of specific neurobiological organizations.
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Authors: Daochen Zha, Louis Feng, Liang Luo, Bhargav Bhushanam, Zirui Liu, Yusuo Hu, Jade Nie, Yuzhen Huang, Yuandong Tian, Arun Kejariwal, Xia Hu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR); Performance (cs.PF)
Abstract
Sharding a large machine learning model across multiple devices to balance the costs is important in distributed training. This is challenging because partitioning is NP-hard, and estimating the costs accurately and efficiently is difficult. In this work, we explore a "pre-train, and search" paradigm for efficient sharding. The idea is to pre-train a universal and once-for-all neural network to predict the costs of all the possible shards, which serves as an efficient sharding simulator. Built upon this pre-trained cost model, we then perform an online search to identify the best sharding plans given any specific sharding task. We instantiate this idea in deep learning recommendation models (DLRMs) and propose NeuroShard for embedding table sharding. NeuroShard pre-trains neural cost models on augmented tables to cover various sharding scenarios. Then it identifies the best column-wise and table-wise sharding plans with beam search and greedy grid search, respectively. Experiments show that NeuroShard significantly and consistently outperforms the state-of-the-art on the benchmark sharding dataset, achieving up to 23.8% improvement. When deployed in an ultra-large production DLRM with multi-terabyte embedding tables, NeuroShard achieves 11.6% improvement in embedding costs over the state-of-the-art, which translates to 6.6% end-to-end training throughput improvement. To facilitate future research of the "pre-train, and search" paradigm in ML for Systems, we open-source our code at https://github.com/daochenzha/neuroshard
Prediction of Performance and Power Consumption of GPGPU Applications
Authors: Gargi Alavani, Santonu Sarkar
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Abstract
Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance, making efficient use of different resources available. While extracting optimal performance of applications on an HPC infrastructure, developers should also ensure the applications have the least energy usage considering the massive power consumption of data centres and HPC servers. This thesis presents two models developed which can be utilized by developers in analysing the CUDA kernel's energy dissipation. The first one is a model that predicts the CUDA kernel's execution time. Here a PTX code is statically analysed to extract instruction features, control flow, and data dependence. We propose two scheduling algorithm approaches that satisfy the performance and hardware constraints. The second model is a static analysis-based power prediction built by utilizing machine learning techniques. Features used for building the model are derived using static analysis of PTX code. These features are chosen to understand the relationship between GPU power consumption and program features that can aid developers in building energy-efficient, sustainable applications. The dataset used for validating both models include kernels from different benchmarks suits, sizes, nature (e.g., compute-bound, memory-bound), and complexity (e.g., control divergence, memory access patterns). We also present a tool that has practically validated the effectiveness and ease of using the two models as design assistance tools for GPU.
Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey
Authors: Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Image and Video Processing (eess.IV)
Abstract
With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applications. However, the overall impact of AI on agrifood systems remains unclear. In this paper, we thoroughly review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry. Firstly, we summarize the data acquisition methods in agrifood systems, including acquisition, storage, and processing techniques. Secondly, we present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery, covering topics such as agrifood classification, growth monitoring, yield prediction, and quality assessment. Furthermore, we highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI. We hope this survey could offer an overall picture to newcomers in the field and serve as a starting point for their further research.
Hybrid Active-Passive IRS Assisted Energy-Efficient Wireless Communication
Abstract
Deploying active reflecting elements at the intelligent reflecting surface (IRS) increases signal amplification capability but incurs higher power consumption. Therefore, it remains a challenging and open problem to determine the optimal number of active/passive elements for maximizing energy efficiency (EE). To answer this question, we consider a hybrid active-passive IRS (H-IRS) assisted wireless communication system, where the H-IRS consists of both active and passive reflecting elements.Specifically, we study the optimization of the number of active/passive elements at the H-IRS to maximize EE. To this end, we first derive the closed-form expression for a near-optimal solution under the line-of-sight (LoS) channel case and obtain its optimal solution under the Rayleigh fading channel case. Then, an efficient algorithm is employed to obtain a high-quality sub-optimal solution for the EE maximization under the general Rician channel case. Simulation results demonstrate the effectiveness of the H-IRS for maximizing EE under different Rician factors and IRS locations.
Illicit item detection in X-ray images for security applications
Authors: Georgios Batsis, Ioannis Mademlis, Georgios Th. Papadopoulos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours make it a Big Data analysis task. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage, anchor-based object detectors. This paper proposes a two-fold improvement of such algorithms for the X-ray analysis domain, introducing two complementary novelties. Firstly, more efficient anchors are obtained by hierarchical clustering the sizes of the ground-truth training set bounding boxes; thus, the resulting anchors follow a natural hierarchy aligned with the semantic structure of the data. Secondly, the default Non-Maximum Suppression (NMS) algorithm at the end of the object detection pipeline is modified to better handle occluded object detection and to reduce the number of false predictions, by inserting the Efficient Intersection over Union (E-IoU) metric into the Weighted Cluster NMS method. E-IoU provides more discriminative geometrical correlations between the candidate bounding boxes/Regions-of-Interest (RoIs). The proposed method is implemented on a common single-stage object detector (YOLOv5) and its experimental evaluation on a relevant public dataset indicates significant accuracy gains over both the baseline and competing approaches. This highlights the potential of Big Data analysis in enhancing public safety.
Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer
Authors: Rami Hamdi, Ahmed Ben Said, Emna Baccour, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine and Deep Learning (ML/DL) models for intelligent industrial IoT applications is very challenging for resource limited devices over heterogeneous wireless networks (HetNets). Hierarchical Federated Learning (HFL) performs training at multiple layers offloading the tasks to nearby Multi-Access Edge Computing (MEC) units. In this paper, we propose a novel energy-efficient HFL framework enabled by Wireless Energy Transfer (WET) and designed for heterogeneous networks with massive Multiple-Input Multiple-Output (MIMO) wireless backhaul. Our energy-efficiency approach is formulated as a Mixed-Integer Non-Linear Programming (MINLP) problem, where we optimize the HFL device association and manage the wireless transmitted energy. However due to its high complexity, we design a Heuristic Resource Management Algorithm, namely H2RMA, that respects energy, channel quality, and accuracy constraints, while presenting a low computational complexity. We also improve the energy consumption of the network using an efficient device scheduling scheme. Finally, we investigate device mobility and its impact on the HFL performance. Our extensive experiments confirm the high performance of the proposed resource management approach in HFL over HetNets, in terms of training loss and grid energy costs.
Putting collective intelligence to the enforcement of the Digital Services Act
Abstract
While underlying the many ways to build strong cooperation settings between regulators and CSOs, this report focuses on making concrete recommendations for the design of an efficient and influential expert group with the European Commission. The creation of an expert group finds its roots in article 64 and recital 137 of the DSA which require the Commission to develop Union expertise and capabilities. Once established, the experts of this group will be able to bring evidence-based information directly to the Commission and specific expertise on the protection of fundamental rights and the safety of users online. By instituting an expert group, the Commission will not only benefit from valuable expert knowledge but will also demonstrate its willingness to put in place an efficient enforcement system based on collective intelligence. Aside from the establishment of an expert group, other cumulative mechanisms will also help the DSA's enforcement to thrive. Civil society organisations should, for instance, consider organising regular crowdsourcing events to deep-dive and analyse the data published by entities covered by the transparency obligations. As it has done in the past, the Commission can sponsor these events and be a direct beneficiary of their results. Another way for civil society organisations to bring information to the Regulator is by legal action, including by making complaints to the regulators.
"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Authors: Zhixi Cai, Shreya Ghosh, Tom Gedeon, Abhinav Dhall, Kalin Stefanov, Munawar Hayat
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes. This is because available benchmark datasets contain mostly visual-only modifications. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which efficiently captures multimodal manipulations. We further improve (i.e. BA-TFD+) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA- TFD+ on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF.
Computing paths of large rank in planar frameworks deterministically
Authors: Fedor V. Fomin, Petr A. Golovach, Tuukka Korhonen, Giannos Stamoulis
Abstract
A framework consists of an undirected graph $G$ and a matroid $M$ whose elements correspond to the vertices of $G$. Recently, Fomin et al. [SODA 2023] and Eiben et al. [ArXiV 2023] developed parameterized algorithms for computing paths of rank $k$ in frameworks. More precisely, for vertices $s$ and $t$ of $G$, and an integer $k$, they gave FPT algorithms parameterized by $k$ deciding whether there is an $(s,t)$-path in $G$ whose vertex set contains a subset of elements of $M$ of rank $k$. These algorithms are based on Schwartz-Zippel lemma for polynomial identity testing and thus are randomized, and therefore the existence of a deterministic FPT algorithm for this problem remains open. We present the first deterministic FPT algorithm that solves the problem in frameworks whose underlying graph $G$ is planar. While the running time of our algorithm is worse than the running times of the recent randomized algorithms, our algorithm works on more general classes of matroids. In particular, this is the first FPT algorithm for the case when matroid $M$ is represented over rationals. Our main technical contribution is the nontrivial adaptation of the classic irrelevant vertex technique to frameworks to reduce the given instance to one of bounded treewidth. This allows us to employ the toolbox of representative sets to design a dynamic programming procedure solving the problem efficiently on instances of bounded treewidth.
Approximating Long Cycle Above Dirac's Guarantee
Authors: Fedor F. Fomin, Petr A. Golovach, Danil Sagunov, Kirill Simonov
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Abstract
Parameterization above (or below) a guarantee is a successful concept in parameterized algorithms. The idea is that many computational problems admit natural'' guarantees bringing to algorithmic questions whether a better solution (above the guarantee) could be obtained efficiently. The above guarantee paradigm has led to several exciting discoveries in the areas of parameterized algorithms and kernelization. We argue that this paradigm could bring forth fresh perspectives on well-studied problems in approximation algorithms. Our example is the longest cycle problem. One of the oldest results in extremal combinatorics is the celebrated Dirac's theorem from 1952. Dirac's theorem provides the following guarantee on the length of the longest cycle: for every 2-connected n-vertex graph G with minimum degree \delta(G)\leq n/2, the length of a longest cycle L is at least 2\delta(G). Thus, theessential'' part in finding the longest cycle is in approximating the ``offset'' k = L - 2 \delta(G). The main result of this paper is the above-guarantee approximation theorem for k. Informally, the theorem says that approximating the offset k is not harder than approximating the total length L of a cycle. In other words, for any (reasonably well-behaved) function f, a polynomial time algorithm constructing a cycle of length f(L) in an undirected graph with a cycle of length L, yields a polynomial time algorithm constructing a cycle of length 2\delta(G)+\Omega(f(k)).
Deep Learning-Based Multiband Signal Fusion for 3-D SAR Super-Resolution
Authors: Josiah Smith, Murat Torlak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
Three-dimensional (3-D) synthetic aperture radar (SAR) is widely used in many security and industrial applications requiring high-resolution imaging of concealed or occluded objects. The ability to resolve intricate 3-D targets is essential to the performance of such applications and depends directly on system bandwidth. However, because high-bandwidth systems face several prohibitive hurdles, an alternative solution is to operate multiple radars at distinct frequency bands and fuse the multiband signals. Current multiband signal fusion methods assume a simple target model and a small number of point reflectors, which is invalid for realistic security screening and industrial imaging scenarios wherein the target model effectively consists of a large number of reflectors. To the best of our knowledge, this study presents the first use of deep learning for multiband signal fusion. The proposed network, called kR-Net, employs a hybrid, dual-domain complex-valued convolutional neural network (CV-CNN) to fuse multiband signals and impute the missing samples in the frequency gaps between subbands. By exploiting the relationships in both the wavenumber domain and wavenumber spectral domain, the proposed framework overcomes the drawbacks of existing multiband imaging techniques for realistic scenarios at a fraction of the computation time of existing multiband fusion algorithms. Our method achieves high-resolution imaging of intricate targets previously impossible using conventional techniques and enables finer resolution capacity for concealed weapon detection and occluded object classification using multiband signaling without requiring more advanced hardware. Furthermore, a fully integrated multiband imaging system is developed using commercially available millimeter-wave (mmWave) radars for efficient multiband imaging.
Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
Authors: Di Wang, Jing Zhang, Bo Du, Dacheng Tao, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.
Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique
Authors: Josiah Smith, Shiva Thiagarajan, Richard Willis, Yiorgos Makris, Murat Torlak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing ``sterile'' images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85\%$ to $93\%$ and $90\%$ to $95\%$ for range and range-angle profiles, respectively.
Approximate Evaluation of Quantitative Second Order Queries
Authors: Jan Dreier, Robert Ganian, Thekla Hamm
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
Abstract
Courcelle's theorem and its adaptations to cliquewidth have shaped the field of exact parameterized algorithms and are widely considered the archetype of algorithmic meta-theorems. In the past decade, there has been growing interest in developing parameterized approximation algorithms for problems which are not captured by Courcelle's theorem and, in particular, are considered not fixed-parameter tractable under the associated widths. We develop a generalization of Courcelle's theorem that yields efficient approximation schemes for any problem that can be captured by an expanded logic we call Blocked CMSO, capable of making logical statements about the sizes of set variables via so-called weight comparisons. The logic controls weight comparisons via the quantifier-alternation depth of the involved variables, allowing full comparisons for zero-alternation variables and limited comparisons for one-alternation variables. We show that the developed framework threads the very needle of tractability: on one hand it can describe a broad range of approximable problems, while on the other hand we show that the restrictions of our logic cannot be relaxed under well-established complexity assumptions. The running time of our approximation scheme is polynomial in $1/\varepsilon$, allowing us to fully interpolate between faster approximate algorithms and slower exact algorithms. This provides a unified framework to explain the tractability landscape of graph problems parameterized by treewidth and cliquewidth, as well as classical non-graph problems such as Subset Sum and Knapsack.
A survey of modularized backstepping control design approaches to nonlinear ODE systems
Authors: Zhengru Ren
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Abstract
Backstepping is a mature and powerful Lyapunov-based design approach for a specific set of systems. Throughout the development over three decades, innovative theories and practices have extended backstepping to stabilization and tracking problems for nonlinear systems with growing complexity. The attractions of the backstepping-like approach are the recursive design processes and modularized design. A nonlinear system can be transferred into a group of simple problems and solved it by a sequential superposition of the corresponding approaches for each problem. To handle the complexities, backstepping designs always come up with adaptive control and robust control. The survey aims to review the milestone theoretical achievements among thousands of publications making the state-feedback backstepping designs of complex ODE systems to be systematic and modularized. Several selected elegant methods are reviewed, starting from the general designs, and then the finite-time control enhancing the convergence rate, the fuzzy logic system and neural network estimating the system unknowns, the Nussbaum function handling unknown control coefficients, barrier Lyapunov function solving state constraints, and the hyperbolic tangent function applying in robust designs. The associated assumptions and Lyapunov function candidates, inequalities, and the deduction key points are reviewed. The nonlinearity and complexities lay in state constraints, disturbance, input nonlinearities, time-delay effects, pure feedback systems, event-triggered systems, and stochastic systems. Instead of networked systems, the survey focuses on stand-alone systems.
A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution
Abstract
In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wherein a handheld radar is scanned throughout space by a user, unmanned aerial vehicle (UAV) imaging, and automotive SAR face several unique challenges for high-resolution imaging. First, recovering a SAR image requires knowledge of the array positions throughout the scan. While recent work has introduced camera-based positioning systems capable of adequately estimating the position, recovering the algorithm efficiently is a requirement to enable edge and Internet of Things (IoT) technologies. Efficient algorithms for non-cooperative near-field SAR sampling have been explored in recent work, but suffer image defocusing under position estimation error and can only produce medium-fidelity images. In this paper, we introduce a mobile-friend vision transformer (ViT) architecture to address position estimation error and perform SAR image super-resolution (SR) under irregular sampling geometries. The proposed algorithm, Mobile-SRViT, is the first to employ a ViT approach for SAR image enhancement and is validated in simulation and via empirical studies.
Rethinking the Encoding of Satellite Image Time Series
Authors: Xin Cai, Yaxin Bi, Peter Nicholl, Roy Sterritt
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Representation learning of Satellite Image Time Series (SITS) presents its unique challenges, such as prohibitive computation burden caused by high spatiotemporal resolutions, irregular acquisition times, and complex spatiotemporal interactions, leading to highly-specialized neural network architectures for SITS analysis. Despite the promising results achieved by some pioneering work, we argue that satisfactory representation learning paradigms have not yet been established for SITS analysis, causing an isolated island where transferring successful paradigms or the latest advances from Computer Vision (CV) to SITS is arduous. In this paper, we develop a unique perspective of SITS processing as a direct set prediction problem, inspired by the recent trend in adopting query-based transformer decoders to streamline the object detection or image segmentation pipeline, and further propose to decompose the representation learning process of SITS into three explicit steps: collect--update--distribute, which is computationally efficient and suits for irregularly-sampled and asynchronous temporal observations. Facilitated by the unique reformulation and effective feature extraction framework proposed, our models pre-trained on pixel-set format input and then fine-tuned on downstream dense prediction tasks by simply appending a commonly-used segmentation network have attained new state-of-the-art (SoTA) results on PASTIS dataset compared to bespoke neural architectures such as U-TAE. Furthermore, the clear separation, conceptually and practically, between temporal and spatial components in the panoptic segmentation pipeline of SITS allows us to leverage the recent advances in CV, such as Mask2Former, a universal segmentation architecture, resulting in a noticeable 8.8 points increase in PQ compared to the best score reported so far.
Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging
Authors: Christos Vasileiou, Josiah W. Smith, Shiva Thiagarajan, Matthew Nigh, Yiorgos Makris, Murat Torlak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a method for generating high-resolution images by scanning a radar across space to create a synthetic aperture, is of interest due to its high-fidelity spatial sensing capability, low cost devices, and large application space. Since SAR imaging requires large aperture sizes to achieve high resolution, super-resolution algorithms are valuable for many applications. Freehand smartphone SAR, an emerging sensing modality, requires irregular SAR apertures in the near-field and computation on mobile devices. Achieving efficient high-resolution SAR images from irregularly sampled data collected by freehand motion of a smartphone is a challenging task. In this paper, we propose a novel CNN architecture to achieve SAR image super-resolution for mobile applications by employing state-of-the-art SAR processing and deep learning techniques. The proposed algorithm is verified via simulation and an empirical study. Our algorithm demonstrates high-efficiency and high-resolution radar imaging for near-field scenarios with irregular scanning geometries.
Heterogeneous GNN-RL Based Task Offloading for UAV-aided Smart Agriculture
Authors: Turgay Pamuklu, Aisha Syed, W. Sean Kennedy, Melike Erol-Kantarci
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Having unmanned aerial vehicles (UAVs) with edge computing capability hover over smart farmlands supports Internet of Things (IoT) devices with low processing capacity and power to accomplish their deadline-sensitive tasks efficiently and economically. In this work, we propose a graph neural network-based reinforcement learning solution to optimize the task offloading from these IoT devices to the UAVs. We conduct evaluations to show that our approach reduces task deadline violations while also increasing the mission time of the UAVs by optimizing their battery usage. Moreover, the proposed solution has increased robustness to network topology changes and is able to adapt to extreme cases, such as the failure of a UAV.
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning
Authors: Zhen Wei, Pascal Fua, Michaël Bauerheim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
Abstract
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns, eliminating the need for further handcrafting. The Latent Space Model (LSM) learns a low-dimensional latent representation of an object from a dataset of various geometries, while the Direct Mapping Model (DMM) builds parameterization on the fly using only one geometry of interest. We also devise a novel regularization loss that efficiently integrates volumetric mesh deformation into the parameterization model. The models directly manipulate the high-dimensional mesh data by moving vertices. LSM and DMM are fully differentiable, enabling gradient-based, end-to-end pipeline design and plug-and-play deployment of surrogate models or adjoint solvers. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
On the Channel Correlation in Reconfigurable Intelligent Surface-Aided System
Authors: Kuang-Hao (Stanley)Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This works explores the correlation between channels in reconfigurable intelligent surface (RIS)-aided communication systems. In this type of system, an RIS made up of many passive elements with adjustable phases reflects the transmitter's signal to the receiver. Since the transmitter-RIS link may be shared by multiple receivers, the cascade channels of two receivers may experience correlated fading, which can negatively impact system performance. Using the mean correlation coefficient as a metric, we analyze the correlation between two cascade channels and derive an accurate approximation in closed form. We also consider the extreme case of an infinitely large number of RIS elements and obtain a convergence result. Our analysis accuracy is validated by simulation results, which offer insights into the correlation characteristics of RIS-aided fading channels.
An identification method for oscillators with response-dependent inertia
Authors: Yuval Harduf (1), Eyal Setter (1), Izhak Bucher (1) ((1) Technion Israel Institute of Technology, Faculty of mechanical engineering)
Abstract
This paper is concerned with identifying the instantaneous modal parameters of oscillatory systems with response-dependent inertia (mass, inductance, or equivalent) based on their measured dynamics. An identification method is proposed, which is a variation of the "FORCEVIB" method. The method utilizes analytic signal representation and the properties of the Hilbert transform to obtain an analytic relationship between a system's natural frequency and damping coefficient to its response and excitation signals. The proposed method is validated by comparing the identification results to the asymptotic solution of a simple system with response-dependent inertia and is then demonstrated, numerically and experimentally, for other, more complicated, nonlinear systems.
Learning-Augmented Online TSP on Rings, Trees, Flowers and (almost) Everywhere Else
Abstract
We study the Online Traveling Salesperson Problem (OLTSP) with predictions. In OLTSP, a sequence of initially unknown requests arrive over time at points (locations) of a metric space. The goal is, starting from a particular point of the metric space (the origin), to serve all these requests while minimizing the total time spent. The server moves with unit speed or is "waiting" (zero speed) at some location. We consider two variants: in the open variant, the goal is achieved when the last request is served. In the closed one, the server additionally has to return to the origin. We adopt a prediction model, introduced for OLTSP on the line, in which the predictions correspond to the locations of the requests and extend it to more general metric spaces. We first propose an oracle-based algorithmic framework, inspired by previous work. This framework allows us to design online algorithms for general metric spaces that provide competitive ratio guarantees which, given perfect predictions, beat the best possible classical guarantee (consistency). Moreover, they degrade gracefully along with the increase in error (smoothness), but always within a constant factor of the best known competitive ratio in the classical case (robustness). Having reduced the problem to designing suitable efficient oracles, we describe how to achieve this for general metric spaces as well as specific metric spaces (rings, trees and flowers), the resulting algorithms being tractable in the latter case. The consistency guarantees of our algorithms are tight in almost all cases, and their smoothness guarantees only suffer a linear dependency on the error, which we show is necessary. Finally, we provide robustness guarantees improving previous results.
Evanescent Plane Wave Approximation of Helmholtz Solutions in Spherical Domains
Abstract
The recent results presented in arXiv:2202.05608 have led to significant developments in achieving stable approximations of Helmholtz solutions by plane wave superposition. The study shows that the numerical instability and ill-conditioning inherent in plane wave-based Trefftz methods can be effectively overcome with regularization techniques, provided there exist accurate approximations in the form of expansions with bounded coefficients. Whenever the target solution contains high Fourier modes, propagative plane waves fail to yield stable approximations due to the exponential growth of the expansion coefficients. Conversely, evanescent plane waves, whose modal content covers high Fourier regimes, are able to provide both accurate and stable results. The developed numerical approach, which involves constructing evanescent plane wave approximation sets by sampling the parametric domain according to a probability density function, results in substantial improvements when compared to conventional propagative plane wave schemes. The following work extends this research to the three-dimensional setting, confirming the achieved results and introducing new ones. By generalizing the 3D Jacobi$-$Anger identity to complex-valued directions, we show that any Helmholtz solution in a ball can be represented as a continuous superposition of evanescent plane waves. This representation extends the classical Herglotz one and provides a relevant stability result that cannot be achieved with the use of propagative waves alone. The proposed numerical recipes have been tailored for the 3D setting and extended with new sampling strategies involving extremal systems of points. These methods are tested by numerical experiments, showing the desired accuracy and bounded-coefficient stability, in line with the two-dimensional case.
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami
Abstract
Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize this parameter inefficiency is a result of all experts having equal capacity, which may not adequately meet the varying complexity requirements of different tokens or tasks, e.g., in a multilingual setting, languages based on their resource levels might require different capacities. In light of this, we propose Stratified Mixture of Experts(SMoE) models, which feature a stratified structure and can assign dynamic capacity to different tokens. We demonstrate the effectiveness of SMoE on two multilingual machine translation benchmarks, where it outperforms multiple state-of-the-art MoE models. On a diverse 15-language dataset, SMoE improves the translation quality over vanilla MoE by +0.93 BLEU points on average. Additionally, SMoE is parameter-efficient, matching vanilla MoE performance with around 50\% fewer parameters.
Experiences with Remote Examination Formats in Light of GPT-4
Authors: Felix Dobslaw, Peter Bergh
Subjects: Computers and Society (cs.CY); Software Engineering (cs.SE)
Abstract
Sudden access to the rapidly improving large language model GPT by open-ai forces educational institutions worldwide to revisit their exam procedures. In the pre-GPT era, we successfully applied oral and open-book home exams for two courses in the third year of our predominantly remote Software Engineering BSc program. We ask in this paper whether our current open-book exams are still viable or whether a move back to a legally compliant but less scalable oral exam is the only workable alternative. We further compare work-effort estimates between oral and open-book exams and report on differences in throughput and grade distribution over eight years to better understand the impact of examination format on the outcome. Examining GPT v4 on the most recent open-book exams showed that our current Artificial Intelligence and Reactive Programming exams are not GPT v4 proof. Three potential weaknesses of GPT are outlined. We also found that grade distributions have largely been unaffected by the examination format, opening up for a move to oral examinations only if needed. Throughput was higher for open-book exam course instances (73% vs 64%), while fail rates were too (12% vs 7%), with teacher workload increasing even for smaller classes. We also report on our experience regarding effort. Oral examinations are efficient for smaller groups but come with caveats regarding intensity and stress.
Abstract
Data in many real-world applications are often accumulated over time, like a stream. In contrast to conventional machine learning studies that focus on learning from a given training data set, learning from data streams cannot ignore the fact that the incoming data stream can be potentially endless with overwhelming size and unknown changes, and it is impractical to assume to have sufficient computational/storage resource such that all received data can be handled in time. Thus, the generalization performance of learning from data streams depends not only on how many data have been received, but also on how many data can be well exploited timely, with resource and rapidity concerns, in addition to the ability of learning algorithm and complexity of the problem. For this purpose, in this article we introduce the notion of machine learning throughput, define Stream Efficient Learning and present a preliminary theoretical framework.
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
Abstract
We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.
Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference
Authors: Ivone Amorim, Eva Maia, Pedro Barbosa, Isabel Praça
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
The use of Neural Networks (NNs) for sensitive data processing is becoming increasingly popular, raising concerns about data privacy and security. Homomorphic Encryption (HE) has the potential to be used as a solution to preserve data privacy in NN. This study provides a comprehensive analysis on the use of HE for NN training and classification, focusing on the techniques and strategies used to enhance data privacy and security. The current state-of-the-art in HE for NNs is analysed, and the challenges and limitations that need to be addressed to make it a reliable and efficient approach for privacy preservation are identified. Also, the different categories of HE schemes and their suitability for NNs are discussed, as well as the techniques used to optimize the accuracy and efficiency of encrypted models. The review reveals that HE has the potential to provide strong data privacy guarantees for NNs, but several challenges need to be addressed, such as limited support for advanced NN operations, scalability issues, and performance trade-offs.
Multi-dimensional Signal Recovery using Low-rank Deconvolution
Authors: David Reixach
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work we present Low-rank Deconvolution, a powerful framework for low-level feature-map learning for efficient signal representation with application to signal recovery. Its formulation in multi-linear algebra inherits properties from convolutional sparse coding and low-rank approximation methods as in this setting signals are decomposed in a set of filters convolved with a set of low-rank tensors. We show its advantages by learning compressed video representations and solving image in-painting problems.
EFx Budget-Feasible Allocations with High Nash Welfare
Authors: Marius Garbea, Vasilis Gkatzelis, Xizhi Tan
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
We study the problem of allocating indivisible items to budget-constrained agents, aiming to provide fairness and efficiency guarantees. Specifically, our goal is to ensure that the resulting allocation is envy-free up to any item (EFx) while minimizing the amount of inefficiency that this needs to introduce. We first show that there exist two-agent problem instances for which no EFx allocation is Pareto efficient. We, therefore, turn to approximation and use the Nash social welfare maximizing allocation as a benchmark. For two-agent instances, we provide a procedure that always returns an EFx allocation while achieving the best possible approximation of the optimal Nash social welfare that EFx allocations can achieve. For the more complicated case of three-agent instances, we provide a procedure that guarantees EFx, while achieving a constant approximation of the optimal Nash social welfare for any number of items.
DynamicStereo: Consistent Dynamic Depth from Stereo Videos
Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. Our architecture is designed to process stereo videos efficiently through divided attention layers. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments, which provides complementary training and evaluation data for dynamic stereo closer to real applications than existing datasets. Training with this dataset further improves the quality of predictions of our proposed DynamicStereo as well as prior methods. Finally, it acts as a benchmark for consistent stereo methods.
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Abstract
Large-scale visual language models are widely used as pre-trained models and then adapted for various downstream tasks. While humans are known to efficiently learn new tasks from a few examples, deep learning models struggle with adaptation from few examples. In this work, we look into task adaptation in the low-data regime, and provide a thorough study of the existing adaptation methods for generative Visual Language Models. And we show important benefits of self-labelling, i.e. using the model's own predictions to self-improve when having access to a larger number of unlabelled images of the same distribution. Our study demonstrates significant gains using our proposed task adaptation pipeline across a wide range of visual language tasks such as visual classification (ImageNet), visual captioning (COCO), detailed visual captioning (Localised Narratives) and visual question answering (VQAv2).
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Authors: Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs. We introduce Distilling step-by-step, a new mechanism that (a) trains smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by finetuning or distillation. Our method extracts LLM rationales as additional supervision for small models within a multi-task training framework. We present three findings across 4 NLP benchmarks: First, compared to both finetuning and distillation, our mechanism achieves better performance with much fewer labeled/unlabeled training examples. Second, compared to LLMs, we achieve better performance using substantially smaller model sizes. Third, we reduce both the model size and the amount of data required to outperform LLMs; our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task.
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Abstract
Large language models (LLMs) have demonstrated remarkable abilities in representation learning for program synthesis and understanding tasks. The quality of the learned representations appears to be dictated by the neural scaling laws as a function of the number of model parameters and observations, while imposing upper bounds on the model performance by the amount of available data and compute, which is costly. In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions. Specifically, for the model architecture, we attempt to unify encoder and decoder-based models into a single prefix-LM. For learning methods, (i) causal language modeling, (ii) span corruption, (iii) infilling are unified into a simple learning algorithm. For infill sampling, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution of programming and natural languages on model performance is explored. We conduct a comprehensive series of empirical experiments on 1B LLMs, for which failures and successes of this exploration are distilled into four lessons. We will provide a final recipe for training and release CodeGen2 models in size 1B, 3.7B, 7B, and, 16B parameters, along with the training framework as open-source: https://github.com/salesforce/CodeGen2.
AG3D: Learning to Generate 3D Avatars from 2D Image Collections
Authors: Zijian Dong, Xu Chen, Jinlong Yang, Michael J. Black, Otmar Hilliges, Andreas Geiger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars from abundant unstructured 2D image collections. However, learning realistic and complete 3D appearance and geometry in this under-constrained setting remains challenging, especially in the presence of loose clothing such as dresses. In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator and integrating an efficient and flexible articulation module. To improve realism, we train our model using multiple discriminators while also integrating geometric cues in the form of predicted 2D normal maps. We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance. We validate the effectiveness of our model and the importance of each component via systematic ablation studies.
Keyword: faster
Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots
Authors: Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Over the years, much research involving mobile computational entities has been performed. From modeling actual microscopic (and smaller) robots, to modeling software processes on a network, many important problems have been studied in this context. Gathering is one such fundamental problem in this area. The problem of gathering $k$ robots, initially arbitrarily placed on the nodes of an $n$-node graph, asks that these robots coordinate and communicate in a local manner, as opposed to global, to move around the graph, find each other, and settle down on a single node as fast as possible. A more difficult problem to solve is gathering with detection, where once the robots gather, they must subsequently realize that gathering has occurred and then terminate. In this paper, we propose a deterministic approach to solve gathering with detection for any arbitrary connected graph that is faster than existing deterministic solutions for even just gathering (without the requirement of detection) for arbitrary graphs. In contrast to earlier work on gathering, it leverages the fact that there are more robots present in the system to achieve gathering with detection faster than those previous papers that focused on just gathering. The state of the art solution for deterministic gathering~[Ta-Shma and Zwick, TALG, 2014] takes $\Tilde{O}$$(n^5 \log \ell)$ rounds, where $\ell$ is the smallest label among robots and $\Tilde{O}$ hides a polylog factor. We design a deterministic algorithm for gathering with detection with the following trade-offs depending on how many robots are present: (i) when $k \geq \lfloor n/2 \rfloor + 1$, the algorithm takes $O(n^3)$ rounds, (ii) when $k \geq \lfloor n/3 \rfloor + 1$, the algorithm takes $O(n^4 \log n)$ rounds, and (iii) otherwise, the algorithm takes $\Tilde{O}$$(n^5)$ rounds. The algorithm is not required to know $k$, but only $n$.
A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems
Abstract
Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. We propose a lightweight CNN-Transformer model based on a CNN embedding layer and partial self-attention. Our CNN-Transformer model is able to better learn spatial features from input data using a CNN embedding layer compared with the standard Transformer models. It also removes considerable redundancy in fully connected attention models using the proposed partial self-attention. Experiments show that the proposed model outperforms other state-of-the-art Transformer-based models in terms of TSP solution quality, GPU memory usage, and inference time. Our model consumes approximately 20% less GPU memory usage and has 45% faster inference time compared with other state-of-the-art Transformer-based models. Our code is publicly available at https://github.com/cm8908/CNN_Transformer3
Approximate Evaluation of Quantitative Second Order Queries
Authors: Jan Dreier, Robert Ganian, Thekla Hamm
Subjects: Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Logic in Computer Science (cs.LO)
Abstract
Courcelle's theorem and its adaptations to cliquewidth have shaped the field of exact parameterized algorithms and are widely considered the archetype of algorithmic meta-theorems. In the past decade, there has been growing interest in developing parameterized approximation algorithms for problems which are not captured by Courcelle's theorem and, in particular, are considered not fixed-parameter tractable under the associated widths. We develop a generalization of Courcelle's theorem that yields efficient approximation schemes for any problem that can be captured by an expanded logic we call Blocked CMSO, capable of making logical statements about the sizes of set variables via so-called weight comparisons. The logic controls weight comparisons via the quantifier-alternation depth of the involved variables, allowing full comparisons for zero-alternation variables and limited comparisons for one-alternation variables. We show that the developed framework threads the very needle of tractability: on one hand it can describe a broad range of approximable problems, while on the other hand we show that the restrictions of our logic cannot be relaxed under well-established complexity assumptions. The running time of our approximation scheme is polynomial in $1/\varepsilon$, allowing us to fully interpolate between faster approximate algorithms and slower exact algorithms. This provides a unified framework to explain the tractability landscape of graph problems parameterized by treewidth and cliquewidth, as well as classical non-graph problems such as Subset Sum and Knapsack.
Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning
Authors: Carl Chalmers, Paul Fergus, Serge Wich, Steven N Longmore, Naomi Davies Walsh, Philip Stephens, Chris Sutherland, Naomi Matthews, Jens Mudde, Amira Nuseibeh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Birds are important indicators for monitoring both biodiversity and habitat health; they also play a crucial role in ecosystem management. Decline in bird populations can result in reduced eco-system services, including seed dispersal, pollination and pest control. Accurate and long-term monitoring of birds to identify species of concern while measuring the success of conservation interventions is essential for ecologists. However, monitoring is time consuming, costly and often difficult to manage over long durations and at meaningfully large spatial scales. Technology such as camera traps, acoustic monitors and drones provide methods for non-invasive monitoring. There are two main problems with using camera traps for monitoring: a) cameras generate many images, making it difficult to process and analyse the data in a timely manner; and b) the high proportion of false positives hinders the processing and analysis for reporting. In this paper, we outline an approach for overcoming these issues by utilising deep learning for real-time classi-fication of bird species and automated removal of false positives in camera trap data. Images are classified in real-time using a Faster-RCNN architecture. Images are transmitted over 3/4G cam-eras and processed using Graphical Processing Units (GPUs) to provide conservationists with key detection metrics therefore removing the requirement for manual observations. Our models achieved an average sensitivity of 88.79%, a specificity of 98.16% and accuracy of 96.71%. This demonstrates the effectiveness of using deep learning for automatic bird monitoring.
Keyword: mobile
Probabilistic Formal Modelling to Uncover and Interpret Interaction Styles
Authors: Oana Andrei, Muffy Calder, Matthew Chalmers, Alistair Morrison
Abstract
We present a study using new computational methods, based on a novel combination of machine learning for inferring admixture hidden Markov models and probabilistic model checking, to uncover interaction styles in a mobile app. These styles are then used to inform a redesign, which is implemented, deployed, and then analysed using the same methods. The data sets are logged user traces, collected over two six-month deployments of each version, involving thousands of users and segmented into different time intervals. The methods do not assume tasks or absolute metrics such as measures of engagement, but uncover the styles through unsupervised inference of clusters and analysis with probabilistic temporal logic. For both versions there was a clear distinction between the styles adopted by users during the first day/week/month of usage, and during the second and third months, a result we had not anticipated.
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
Abstract
The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).
Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots
Authors: Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Over the years, much research involving mobile computational entities has been performed. From modeling actual microscopic (and smaller) robots, to modeling software processes on a network, many important problems have been studied in this context. Gathering is one such fundamental problem in this area. The problem of gathering $k$ robots, initially arbitrarily placed on the nodes of an $n$-node graph, asks that these robots coordinate and communicate in a local manner, as opposed to global, to move around the graph, find each other, and settle down on a single node as fast as possible. A more difficult problem to solve is gathering with detection, where once the robots gather, they must subsequently realize that gathering has occurred and then terminate. In this paper, we propose a deterministic approach to solve gathering with detection for any arbitrary connected graph that is faster than existing deterministic solutions for even just gathering (without the requirement of detection) for arbitrary graphs. In contrast to earlier work on gathering, it leverages the fact that there are more robots present in the system to achieve gathering with detection faster than those previous papers that focused on just gathering. The state of the art solution for deterministic gathering~[Ta-Shma and Zwick, TALG, 2014] takes $\Tilde{O}$$(n^5 \log \ell)$ rounds, where $\ell$ is the smallest label among robots and $\Tilde{O}$ hides a polylog factor. We design a deterministic algorithm for gathering with detection with the following trade-offs depending on how many robots are present: (i) when $k \geq \lfloor n/2 \rfloor + 1$, the algorithm takes $O(n^3)$ rounds, (ii) when $k \geq \lfloor n/3 \rfloor + 1$, the algorithm takes $O(n^4 \log n)$ rounds, and (iii) otherwise, the algorithm takes $\Tilde{O}$$(n^5)$ rounds. The algorithm is not required to know $k$, but only $n$.
A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution
Abstract
In this paper, we develop a novel super-resolution algorithm for near-field synthetic-aperture radar (SAR) under irregular scanning geometries. As fifth-generation (5G) millimeter-wave (mmWave) devices are becoming increasingly affordable and available, high-resolution SAR imaging is feasible for end-user applications and non-laboratory environments. Emerging applications such freehand imaging, wherein a handheld radar is scanned throughout space by a user, unmanned aerial vehicle (UAV) imaging, and automotive SAR face several unique challenges for high-resolution imaging. First, recovering a SAR image requires knowledge of the array positions throughout the scan. While recent work has introduced camera-based positioning systems capable of adequately estimating the position, recovering the algorithm efficiently is a requirement to enable edge and Internet of Things (IoT) technologies. Efficient algorithms for non-cooperative near-field SAR sampling have been explored in recent work, but suffer image defocusing under position estimation error and can only produce medium-fidelity images. In this paper, we introduce a mobile-friend vision transformer (ViT) architecture to address position estimation error and perform SAR image super-resolution (SR) under irregular sampling geometries. The proposed algorithm, Mobile-SRViT, is the first to employ a ViT approach for SAR image enhancement and is validated in simulation and via empirical studies.
Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging
Authors: Christos Vasileiou, Josiah W. Smith, Shiva Thiagarajan, Matthew Nigh, Yiorgos Makris, Murat Torlak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
In this paper, we introduce an innovative super resolution approach to emerging modes of near-field synthetic aperture radar (SAR) imaging. Recent research extends convolutional neural network (CNN) architectures from the optical to the electromagnetic domain to achieve super resolution on images generated from radar signaling. Specifically, near-field synthetic aperture radar (SAR) imaging, a method for generating high-resolution images by scanning a radar across space to create a synthetic aperture, is of interest due to its high-fidelity spatial sensing capability, low cost devices, and large application space. Since SAR imaging requires large aperture sizes to achieve high resolution, super-resolution algorithms are valuable for many applications. Freehand smartphone SAR, an emerging sensing modality, requires irregular SAR apertures in the near-field and computation on mobile devices. Achieving efficient high-resolution SAR images from irregularly sampled data collected by freehand motion of a smartphone is a challenging task. In this paper, we propose a novel CNN architecture to achieve SAR image super-resolution for mobile applications by employing state-of-the-art SAR processing and deep learning techniques. The proposed algorithm is verified via simulation and an empirical study. Our algorithm demonstrates high-efficiency and high-resolution radar imaging for near-field scenarios with irregular scanning geometries.
Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter
Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
This paper investigated the distributed leader follower formation control problem for multiple differentially driven mobile robots. A distributed estimator is first introduced and it only requires the state information from each follower itself and its neighbors. Then, we propose a bioinspired neural dynamic based backstepping and sliding mode control hybrid formation control method with proof of its stability. The proposed control strategy resolves the impractical speed jump issue that exists in the conventional backstepping design. Additionally, considering the system and measurement noises, the proposed control strategy not only removes the chattering issue existing in the conventional sliding mode control but also provides smooth control input with extra robustness. After that, an adaptive sliding innovation filter is integrated with the proposed control to provide accurate state estimates that are robust to modeling uncertainties. Finally, we performed multiple simulations to demonstrate the efficiency and effectiveness of the proposed formation control strategy.
Keyword: pruning
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Authors: Lee Milburn, Juan Gamba, Miguel Fernandes, Claudio Semini
Abstract
The VINUM project seeks to address the shortage of skilled labor in modern vineyards by introducing a cutting-edge mobile robotic solution. Leveraging the capabilities of the quadruped robot, HyQReal, this system, equipped with arm and vision sensors, offers autonomous navigation and winter pruning of grapevines reducing the need for human intervention. At the heart of this approach lies an architecture that empowers the robot to easily navigate vineyards, identify grapevines with unparalleled accuracy, and approach them for pruning with precision. A state machine drives the process, deftly switching between various stages to ensure seamless and efficient task completion. The system's performance was assessed through experimentation, focusing on waypoint precision and optimizing the robot's workspace for single-plant operations. Results indicate that the architecture is highly reliable, with a mean error of 21.5cm and a standard deviation of 17.6cm for HyQReal. However, improvements in grapevine detection accuracy are necessary for optimal performance. This work is based on a computer-vision-based navigation method for quadruped robots in vineyards, opening up new possibilities for selective task automation. The system's architecture works well in ideal weather conditions, generating and arriving at precise waypoints that maximize the attached robotic arm's workspace. This work is an extension of our short paper presented at the Italian Conference on Robotics and Intelligent Machines (I-RIM).
Bicubic++: Slim, Slimmer, Slimmest -- Designing an Industry-Grade Super-Resolution Network
Authors: Bahri Batuhan Bilecen, Mustafa Ayazoglu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
We propose a real-time and lightweight single-image super-resolution (SR) network named Bicubic++. Despite using spatial dimensions of the input image across the whole network, Bicubic++ first learns quick reversible downgraded and lower resolution features of the image in order to decrease the number of computations. We also construct a training pipeline, where we apply an end-to-end global structured pruning of convolutional layers without using metrics like magnitude and gradient norms, and focus on optimizing the pruned network's PSNR on the validation set. Furthermore, we have experimentally shown that the bias terms take considerable amount of the runtime while increasing PSNR marginally, hence we have also applied bias removal to the convolutional layers. Our method adds ~1dB on Bicubic upscaling PSNR for all tested SR datasets and runs with ~1.17ms on RTX3090 and ~2.9ms on RTX3070, for 720p inputs and 4K outputs, both in FP16 precision. Bicubic++ won NTIRE 2023 RTSR Track 2 x3 SR competition and is the fastest among all competitive methods. Being almost as fast as the standard Bicubic upsampling method, we believe that Bicubic++ can set a new industry standard.
Abstract
Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matrix and the model weights using the same masking mechanism, but since the roles of the graph adjacency matrix and the weight matrices are very different, we find that their sparsifications lead to different performance characteristics. Specifically, we find that the performance of a sparsified GNN degrades significantly when the graph sparsity goes beyond a certain extent. Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high. First, UGS prunes the adjacency matrix using a loss formulation which, however, does not properly involve all elements of the adjacency matrix; in contrast, we add a new auxiliary loss head to better guide the edge pruning by involving the entire adjacency matrix. Second, by regarding unfavorable graph sparsification as adversarial data perturbations, we formulate the pruning process as a min-max optimization problem to gain the robustness of lottery tickets when the graph sparsity is high. We further investigate the question: Can the "retrainable" winning ticket of a GNN be also effective for graph transferring learning? We call it the transferable graph lottery ticket (GLT) hypothesis. Extensive experiments were conducted which demonstrate the superiority of our proposed sparsification method over UGS, and which empirically verified our transferable GLT hypothesis.
Keyword: voxel
There is no result
Keyword: lidar
Direct LiDAR-Inertial Odometry and Mapping: Perceptive and Connective SLAM
Authors: Kenny Chen, Ryan Nemiroff, Brett T. Lopez
Abstract
This paper presents Direct LiDAR-Inertial Odometry and Mapping (DLIOM), a robust SLAM algorithm with an explicit focus on computational efficiency, operational reliability, and real-world efficacy. DLIOM contains several key algorithmic innovations in both the front-end and back-end subsystems to design a resilient LiDAR-inertial architecture that is perceptive to the environment and produces accurate localization and high-fidelity 3D mapping for autonomous robotic platforms. Our ideas spawned after a deep investigation into modern LiDAR SLAM systems and their inabilities to generalize across different operating environments, in which we address several common algorithmic failure points by means of proactive safe-guards to provide long-term operational reliability in the unstructured real world. We detail several important innovations to localization accuracy and mapping resiliency distributed throughout a typical LiDAR SLAM pipeline to comprehensively increase algorithmic speed, accuracy, and robustness. In addition, we discuss insights gained from our ground-up approach while implementing such a complex system for real-time state estimation on resource-constrained systems, and we experimentally show the increased performance of our method as compared to the current state-of-the-art on both public benchmark and self-collected datasets.
On procedural urban digital twin generation and visualization of large scale data
Authors: Sanjay Somanath, Vasilis Naserentin, Orfeas Eleftheriou, Daniel Sjölie, Beata Stahre Wästberg, Anders Logg
Abstract
The desired outcome for urban digital twins is an automatically generated detailed 3D model of a building from aerial imagery, footprints, LiDAR, or a fusion of these. Such 3D models have applications in architecture, civil engineering, urban planning, construction, real estate, GIS, and many others. Further, the visualization of large-scale data in conjunction with the generated 3D models is often a recurring and resource-intensive task. However, a completely automated end-to-end workflow is complex, requiring many steps to achieve a high-quality visualization. Methods for building reconstruction approaches have come a long way from previously manual approaches to semi-automatic or automatic approaches. The next step after reconstructing buildings is visualizing the buildings and their context. Advances in real-time rendering using game engines have enabled the extension of building reconstruction methods to procedurally generated context generation. This paper aims to complement existing methods of 3D building generation. First, we present a literature review covering different options for procedurally generated context generation and visualization methods in-depth, focusing on workflows and data pipelines. Next, we present a semi-automated workflow that extends the building reconstruction pipeline to include procedural context generation (terrain and vegetation) using Unreal Engine and, finally, the integration of various types of large-scale urban analysis data for visualization. We conclude with a series of challenges faced in achieving such pipelines and the limitations of the current approach. The steps for a complete, end-to-end solution involve developing robust systems for building detection, rooftop recognition, and geometry generation and importing and visualizing data in the same 3D environment.
Keyword: diffusion
DiffuSum: Generation Enhanced Extractive Summarization with Diffusion
Abstract
Extractive summarization aims to form a summary by directly extracting sentences from the source document. Existing works mostly formulate it as a sequence labeling problem by making individual sentence label predictions. This paper proposes DiffuSum, a novel paradigm for extractive summarization, by directly generating the desired summary sentence representations with diffusion models and extracting sentences based on sentence representation matching. In addition, DiffuSum jointly optimizes a contrastive sentence encoder with a matching loss for sentence representation alignment and a multi-class contrastive loss for representation diversity. Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of $44.83/22.56/40.56$. Experiments on the other two datasets with different summary lengths also demonstrate the effectiveness of DiffuSum. The strong performance of our framework shows the great potential of adapting generative models for extractive summarization.
Multimodal Procedural Planning via Dual Text-Image Prompting
Authors: Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang
Abstract
Embodied agents have achieved prominent performance in following human instructions to complete tasks. However, the potential of providing instructions informed by texts and images to assist humans in completing tasks remains underexplored. To uncover this capability, we present the multimodal procedural planning (MPP) task, in which models are given a high-level goal and generate plans of paired text-image steps, providing more complementary and informative guidance than unimodal plans. The key challenges of MPP are to ensure the informativeness, temporal coherence,and accuracy of plans across modalities. To tackle this, we propose Text-Image Prompting (TIP), a dual-modality prompting method that jointly leverages zero-shot reasoning ability in large language models (LLMs) and compelling text-to-image generation ability from diffusion-based models. TIP improves the interaction in the dual modalities using Text-to-Image Bridge and Image-to-Text Bridge, allowing LLMs to guide the textual-grounded image plan generation and leveraging the descriptions of image plans to ground the textual plan reversely. To address the lack of relevant datasets, we collect WIKIPLAN and RECIPEPLAN as a testbed for MPP. Our results show compelling human preferences and automatic scores against unimodal and multimodal baselines on WIKIPLAN and RECIPEPLAN in terms of informativeness, temporal coherence, and plan accuracy. Our code and data: https://github.com/YujieLu10/MPP.
Unpaired Downscaling of Fluid Flows with Diffusion Bridges
Abstract
We present a method to downscale idealized geophysical fluid simulations using generative models based on diffusion maps. By analyzing the Fourier spectra of images drawn from different data distributions, we show how one can chain together two independent conditional diffusion models for use in domain translation. The resulting transformation is a diffusion bridge between a low resolution and a high resolution dataset and allows for new sample generation of high-resolution images given specific low resolution features. The ability to generate new samples allows for the computation of any statistic of interest, without any additional calibration or training. Our unsupervised setup is also designed to downscale images without access to paired training data; this flexibility allows for the combination of multiple source and target domains without additional training. We demonstrate that the method enhances resolution and corrects context-dependent biases in geophysical fluid simulations, including in extreme events. We anticipate that the same method can be used to downscale the output of climate simulations, including temperature and precipitation fields, without needing to train a new model for each application and providing a significant computational cost savings.
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Authors: Changrong Xiao, Sean Xin Xu, Kunpeng Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs. Extensive experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods, and particularly a significant boost when having fewer training instances. In addition, models trained on our augmented datasets also outperform prior unpaired image captioning methods by a large margin. Finally, further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data based on quality assessment.
The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly
Abstract
Algorithmic self-assembly occurs when disorganized components autonomously combine to form structures and, by their design and the dynamics of the system, are forced to follow the execution of algorithms. Motivated by applications in DNA-nanotechnology, investigations in algorithmic tile-based self-assembly have blossomed into a mature theory with research leveraging tools from computability theory, complexity theory, information theory, and graph theory to develop a wide range of models and show that many are computationally universal, while also exposing powers and limitations of each. Beyond computational universality, the abstract Tile Assembly Model (aTAM) was shown to be intrinsically universal (IU), a strong notion of completeness where a single tile set is capable of simulating all systems within the model; however, this result required non-deterministic tile attachments. This was later confirmed necessary when it was shown that the class of directed aTAM systems is not IU. Building on these results to further investigate the impacts of other dynamics, Hader et al. examined several tile-assembly models which varied across (1) the numbers of dimensions used, (2) restrictions based on diffusion of tiles through space, and (3) whether each system is directed, and showed which models are IU. Such results have shed much light on the roles of various aspects of the dynamics of tile-assembly and their effects on the intrinsic universality of each model. Here we provide direct comparisons of the various models by considering intrinsic simulations between models. We show that in some cases one model is more powerful than another, and in others, pairs of models have mutually exclusive capabilities. This comparison helps to expose the impacts of these three important aspects and further helps define a hierarchy of tile-assembly models.
DiffFacto Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
Abstract
While the community of 3D point cloud generation has witnessed a big growth in recent years, there still lacks an effective way to enable intuitive user control in the generation process, hence limiting the general utility of such methods. Since an intuitive way of decomposing a shape is through its parts, we propose to tackle the task of controllable part-based point cloud generation. We introduce DiffFacto, a novel probabilistic generative model that learns the distribution of shapes with part-level control. We propose a factorization that models independent part style and part configuration distributions, and present a novel cross diffusion network that enables us to generate coherent and plausible shapes under our proposed factorization. Experiments show that our method is able to generate novel shapes with multiple axes of control. It achieves state-of-the-art part-level generation quality and generates plausible and coherent shape, while enabling various downstream editing applications such as shape interpolation, mixing and transformation editing. Code will be made publicly available.
Deep Graph Representation Learning and Optimization for Influence Maximization
Authors: Chen Ling, Junji Jiang, Junxiang Wang, My Thai, Lukas Xue, James Song, Meikang Qiu, Liang Zhao
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Abstract
Influence maximization (IM) is formulated as selecting a set of initial users from a social network to maximize the expected number of influenced users. Researchers have made great progress in designing various traditional methods, and their theoretical design and performance gain are close to a limit. In the past few years, learning-based IM methods have emerged to achieve stronger generalization ability to unknown graphs than traditional ones. However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. Finally, we design a novel objective function to infer optimal seed sets under flexible node-centrality-based budget constraints. Extensive analyses are conducted over both synthetic and real-world datasets to demonstrate the overall performance of DeepIM. The code and data are available at: https://github.com/triplej0079/DeepIM.
Keyword: dynamic
Physics-Informed and Data-Driven Discovery of Governing Equations for Complex Phenomena in Heterogeneous Media
Authors: Muhammad Sahimi
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
Abstract
Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition softwares/hardwares are providing vast amounts of data for various complex phenomena, ranging from those in atomospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse high-dimensional, multiscale and multiphysics phenomena that contain elements of stochasticity, and to generate large volumes of numerical data for them in heterogeneous systems. The difficulty is, however, that often the governing equations for such phenomena are not known. A prime example is flow, transport, and deformation processes in macroscopically-heterogeneous materials and geomedia. In other cases, the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on data, or that they require constitutive relations, such as the relationship between the stress tensor and the velocity gradients for non-Newtonian fluids in the momentum conservation equation, in order for them to be useful to the modeling. Several classes of approaches are emerging to address such problems that are based on machine learning, symbolic regression, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation, and stochastic optimization and analysis, or a combination of two or more of such approaches. This Perspective describes the latest developments in this highly important area, and discusses possible future directions.
Abstract
Most existing visual reasoning tasks, such as CLEVR in VQA, ignore an important factor, i.e.~transformation. They are solely defined to test how well machines understand concepts and relations within static settings, like one image. Such \textbf{state driven} visual reasoning has limitations in reflecting the ability to infer the dynamics between different states, which has shown to be equally important for human cognition in Piaget's theory. To tackle this problem, we propose a novel \textbf{transformation driven} visual reasoning (TVR) task. Given both the initial and final states, the target becomes to infer the corresponding intermediate transformation. Following this definition, a new synthetic dataset namely TRANCE is first constructed on the basis of CLEVR, including three levels of settings, i.e.~Basic (single-step transformation), Event (multi-step transformation), and View (multi-step transformation with variant views). Next, we build another real dataset called TRANCO based on COIN, to cover the loss of transformation diversity on TRANCE. Inspired by human reasoning, we propose a three-staged reasoning framework called TranNet, including observing, analyzing, and concluding, to test how recent advanced techniques perform on TVR. Experimental results show that the state-of-the-art visual reasoning models perform well on Basic, but are still far from human-level intelligence on Event, View, and TRANCO. We believe the proposed new paradigm will boost the development of machine visual reasoning. More advanced methods and new problems need to be investigated in this direction. The resource of TVR is available at \url{https://hongxin2019.github.io/TVR/}.
Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles
Authors: Aik Rui Tan, Shingo Urata, Samuel Goldman, Johannes C.B. Dietschreit, Rafael Gómez-Bombarelli
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Abstract
Neural networks (NNs) often assign high confidence to their predictions, even for points far out-of-distribution, making uncertainty quantification (UQ) a challenge. When they are employed to model interatomic potentials in materials systems, this problem leads to unphysical structures that disrupt simulations, or to biased statistics and dynamics that do not reflect the true physics. Differentiable UQ techniques can find new informative data and drive active learning loops for robust potentials. However, a variety of UQ techniques, including newly developed ones, exist for atomistic simulations and there are no clear guidelines for which are most effective or suitable for a given case. In this work, we examine multiple UQ schemes for improving the robustness of NN interatomic potentials (NNIPs) through active learning. In particular, we compare incumbent ensemble-based methods against strategies that use single, deterministic NNs: mean-variance estimation, deep evidential regression, and Gaussian mixture models. We explore three datasets ranging from in-domain interpolative learning to more extrapolative out-of-domain generalization challenges: rMD17, ammonia inversion, and bulk silica glass. Performance is measured across multiple metrics relating model error to uncertainty. Our experiments show that none of the methods consistently outperformed each other across the various metrics. Ensembling remained better at generalization and for NNIP robustness; MVE only proved effective for in-domain interpolation, while GMM was better out-of-domain; and evidential regression, despite its promise, was not the preferable alternative in any of the cases. More broadly, cost-effective, single deterministic models cannot yet consistently match or outperform ensembling for uncertainty quantification in NNIPs.
Fault Tolerant Processing Unit Using Gamma Distribution Sliding Window For Autonomous Landing Guidance System
Authors: Hossam O. Ahmed
Subjects: Systems and Control (eess.SY); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
To keep up with today's dense metropolitan areas and their accompanying traffic problems, a growing number of towns are looking for more advanced and swift urban taxi drones. The safety parameters that must be taken into consideration may be the most important element in the widespread use of such technology. Most recent aviation mishaps have happened during the landing phase, making this a particularly important safety consideration for Vertical and/or Short Take-Off and Landing (V/STOL) drones. In this study, we focused on improving the fault tolerance of the processor architectures used by the predecessors of Autonomous Landing Guidance Assistance Systems (ALGAS), which in turn improves their decision-making capabilities. Furthermore, this is achieved by proposing a fault-tolerant processing architecture that depends on the Gamma Distribution Sliding Window Unit (GDSWU). This proposed GDSWU has been designed completely using VHDL, and the targeted FPFA was the Intel Cyclone V 5CGXFC9D6F27C7 chip. The GDSWU could operate at a maximum frequency of 369.96 MHz, as calculated by the synthesis results of the INTEL Quartus Prime program. The suggested GDSWU core only requires 20.36 mW for dynamic core and I/O power consumption.
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
Abstract
Graph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.
Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach
Authors: Junjie Ye, Jilin Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
In this study, we explore the potential of using a straightforward neural network inspired by the retina model to efficiently restore low-light images. The retina model imitates the neurophysiological principles and dynamics of various optical neurons. Our proposed neural network model reduces the computational overhead compared to traditional signal-processing models while achieving results similar to complex deep learning models from a subjective perceptual perspective. By directly simulating retinal neuron functionalities with neural networks, we not only avoid manual parameter optimization but also lay the groundwork for constructing artificial versions of specific neurobiological organizations.
The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly
Abstract
Algorithmic self-assembly occurs when disorganized components autonomously combine to form structures and, by their design and the dynamics of the system, are forced to follow the execution of algorithms. Motivated by applications in DNA-nanotechnology, investigations in algorithmic tile-based self-assembly have blossomed into a mature theory with research leveraging tools from computability theory, complexity theory, information theory, and graph theory to develop a wide range of models and show that many are computationally universal, while also exposing powers and limitations of each. Beyond computational universality, the abstract Tile Assembly Model (aTAM) was shown to be intrinsically universal (IU), a strong notion of completeness where a single tile set is capable of simulating all systems within the model; however, this result required non-deterministic tile attachments. This was later confirmed necessary when it was shown that the class of directed aTAM systems is not IU. Building on these results to further investigate the impacts of other dynamics, Hader et al. examined several tile-assembly models which varied across (1) the numbers of dimensions used, (2) restrictions based on diffusion of tiles through space, and (3) whether each system is directed, and showed which models are IU. Such results have shed much light on the roles of various aspects of the dynamics of tile-assembly and their effects on the intrinsic universality of each model. Here we provide direct comparisons of the various models by considering intrinsic simulations between models. We show that in some cases one model is more powerful than another, and in others, pairs of models have mutually exclusive capabilities. This comparison helps to expose the impacts of these three important aspects and further helps define a hierarchy of tile-assembly models.
Class adaptive threshold and negative class guided noisy annotation robust Facial Expression Recognition
Abstract
The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we use a dynamic adaptive threshold to separate confident samples from non-confident ones so that our learning won't be hampered due to non-confident samples. Instead of discarding the non-confident samples, we impose consistency in the negative classes of those non-confident samples to guide the model to learn better in the positive class. Since FER datasets usually come with 7 or 8 classes, we can correctly guess a negative class by 85% probability even by choosing randomly. By learning "which class a sample doesn't belong to", the model can learn "which class it belongs to" in a better manner. We demonstrate proposed framework's effectiveness using quantitative as well as qualitative results. Our method performs better than the baseline by a margin of 4% to 28% on RAFDB and 3.3% to 31.4% on FERPlus for various levels of synthetic noisy labels in the aforementioned datasets.
Evolving Dictionary Representation for Few-shot Class-incremental Learning
Authors: Xuejun Han, Yuhong Guo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
New objects are continuously emerging in the dynamically changing world and a real-world artificial intelligence system should be capable of continual and effectual adaptation to new emerging classes without forgetting old ones. In view of this, in this paper we tackle a challenging and practical continual learning scenario named few-shot class-incremental learning (FSCIL), in which labeled data are given for classes in a base session but very limited labeled instances are available for new incremental classes. To address this problem, we propose a novel and succinct approach by introducing deep dictionary learning which is a hybrid learning architecture that combines dictionary learning and visual representation learning to provide a better space for characterizing different classes. We simultaneously optimize the dictionary and the feature extraction backbone in the base session, while only finetune the dictionary in the incremental session for adaptation to novel classes, which can alleviate the forgetting on base classes compared to finetuning the entire model. To further facilitate future adaptation, we also incorporate multiple pseudo classes into the base session training so that certain space projected by dictionary can be reserved for future new concepts. The extensive experimental results on CIFAR100, miniImageNet and CUB200 validate the effectiveness of our approach compared to other SOTA methods.
PODTherm-GP: A Physics-based Data-Driven Approach for Effective Architecture-Level Thermal Simulation of Multi-Core CPUs
Authors: Lin Jiang, Anthony Dowling, Ming-C. Cheng, Yu Liu
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
Abstract
A thermal simulation methodology derived from the proper orthogonal decomposition (POD) and the Galerkin projection (GP), hereafter referred to as PODTherm-GP, is evaluated in terms of its efficiency and accuracy in a multi-core CPU. The GP projects the heat transfer equation onto a mathematical space whose basis functions are generated from thermal data enabled by the POD learning algorithm. The thermal solution data are collected from FEniCS using the finite element method (FEM) accounting for appropriate parametric variations. The GP incorporates physical principles of heat transfer in the methodology to reach high accuracy and efficiency. The dynamic power map for the CPU in FEM thermal simulation is generated from gem5 and McPACT, together with the SPLASH-2 benchmarks as the simulation workload. It is shown that PODTherm-GP offers an accurate thermal prediction of the CPU with a resolution as fine as the FEM. It is also demonstrated that PODTherm-GP is capable of predicting the dynamic thermal profile of the chip with a good accuracy beyond the training conditions. Additionally, the approach offers a reduction in degrees of freedom by more than 5 orders of magnitude and a speedup of 4 orders, compared to the FEM.
Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer
Authors: Rami Hamdi, Ahmed Ben Said, Emna Baccour, Aiman Erbad, Amr Mohamed, Mounir Hamdi, Mohsen Guizani
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Remote monitoring systems analyze the environment dynamics in different smart industrial applications, such as occupational health and safety, and environmental monitoring. Specifically, in industrial Internet of Things (IoT) systems, the huge number of devices and the expected performance put pressure on resources, such as computational, network, and device energy. Distributed training of Machine and Deep Learning (ML/DL) models for intelligent industrial IoT applications is very challenging for resource limited devices over heterogeneous wireless networks (HetNets). Hierarchical Federated Learning (HFL) performs training at multiple layers offloading the tasks to nearby Multi-Access Edge Computing (MEC) units. In this paper, we propose a novel energy-efficient HFL framework enabled by Wireless Energy Transfer (WET) and designed for heterogeneous networks with massive Multiple-Input Multiple-Output (MIMO) wireless backhaul. Our energy-efficiency approach is formulated as a Mixed-Integer Non-Linear Programming (MINLP) problem, where we optimize the HFL device association and manage the wireless transmitted energy. However due to its high complexity, we design a Heuristic Resource Management Algorithm, namely H2RMA, that respects energy, channel quality, and accuracy constraints, while presenting a low computational complexity. We also improve the energy consumption of the network using an efficient device scheduling scheme. Finally, we investigate device mobility and its impact on the HFL performance. Our extensive experiments confirm the high performance of the proposed resource management approach in HFL over HetNets, in terms of training loss and grid energy costs.
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images
Authors: Subin Lin, Vasantha Ramani, Miguel Martin, Pandarasamy Arjunan, Adrian Chong, Filip Biljecki, Marcel Ignatius, Kameshwar Poolla, Clayton Miller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The paper describes a dataset that was collected by infrared thermography, which is a non-contact, non-intrusive technique to collect data and analyze the built environment in various aspects. While most studies focus on the city and building scales, the rooftop observatory provides high temporal and spatial resolution observations with dynamic interactions on the district scale. The rooftop infrared thermography observatory with a multi-modal platform that is capable of assessing a wide range of dynamic processes in urban systems was deployed in Singapore. It was placed on the top of two buildings that overlook the outdoor context of the campus of the National University of Singapore. The platform collects remote sensing data from tropical areas on a temporal scale, allowing users to determine the temperature trend of individual features such as buildings, roads, and vegetation. The dataset includes 1,365,921 thermal images collected on average at approximately 10 seconds intervals from two locations during ten months.
Computing paths of large rank in planar frameworks deterministically
Authors: Fedor V. Fomin, Petr A. Golovach, Tuukka Korhonen, Giannos Stamoulis
Abstract
A framework consists of an undirected graph $G$ and a matroid $M$ whose elements correspond to the vertices of $G$. Recently, Fomin et al. [SODA 2023] and Eiben et al. [ArXiV 2023] developed parameterized algorithms for computing paths of rank $k$ in frameworks. More precisely, for vertices $s$ and $t$ of $G$, and an integer $k$, they gave FPT algorithms parameterized by $k$ deciding whether there is an $(s,t)$-path in $G$ whose vertex set contains a subset of elements of $M$ of rank $k$. These algorithms are based on Schwartz-Zippel lemma for polynomial identity testing and thus are randomized, and therefore the existence of a deterministic FPT algorithm for this problem remains open. We present the first deterministic FPT algorithm that solves the problem in frameworks whose underlying graph $G$ is planar. While the running time of our algorithm is worse than the running times of the recent randomized algorithms, our algorithm works on more general classes of matroids. In particular, this is the first FPT algorithm for the case when matroid $M$ is represented over rationals. Our main technical contribution is the nontrivial adaptation of the classic irrelevant vertex technique to frameworks to reduce the given instance to one of bounded treewidth. This allows us to employ the toolbox of representative sets to design a dynamic programming procedure solving the problem efficiently on instances of bounded treewidth.
Gym-preCICE: Reinforcement Learning Environments for Active Flow Control
Abstract
Active flow control (AFC) involves manipulating fluid flow over time to achieve a desired performance or efficiency. AFC, as a sequential optimisation task, can benefit from utilising Reinforcement Learning (RL) for dynamic optimisation. In this work, we introduce Gym-preCICE, a Python adapter fully compliant with Gymnasium (formerly known as OpenAI Gym) API to facilitate designing and developing RL environments for single- and multi-physics AFC applications. In an actor-environment setting, Gym-preCICE takes advantage of preCICE, an open-source coupling library for partitioned multi-physics simulations, to handle information exchange between a controller (actor) and an AFC simulation environment. The developed framework results in a seamless non-invasive integration of realistic physics-based simulation toolboxes with RL algorithms. Gym-preCICE provides a framework for designing RL environments to model AFC tasks, as well as a playground for applying RL algorithms in various AFC-related engineering applications.
Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique
Authors: Josiah Smith, Shiva Thiagarajan, Richard Willis, Yiorgos Makris, Murat Torlak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract
In this paper, we investigate novel data collection and training techniques towards improving classification accuracy of non-moving (static) hand gestures using a convolutional neural network (CNN) and frequency-modulated-continuous-wave (FMCW) millimeter-wave (mmWave) radars. Recently, non-contact hand pose and static gesture recognition have received considerable attention in many applications ranging from human-computer interaction (HCI), augmented/virtual reality (AR/VR), and even therapeutic range of motion for medical applications. While most current solutions rely on optical or depth cameras, these methods require ideal lighting and temperature conditions. mmWave radar devices have recently emerged as a promising alternative offering low-cost system-on-chip sensors whose output signals contain precise spatial information even in non-ideal imaging conditions. Additionally, deep convolutional neural networks have been employed extensively in image recognition by learning both feature extraction and classification simultaneously. However, little work has been done towards static gesture recognition using mmWave radars and CNNs due to the difficulty involved in extracting meaningful features from the radar return signal, and the results are inferior compared with dynamic gesture classification. This article presents an efficient data collection approach and a novel technique for deep CNN training by introducing ``sterile'' images which aid in distinguishing distinct features among the static gestures and subsequently improve the classification accuracy. Applying the proposed data collection and training methods yields an increase in classification rate of static hand gestures from $85\%$ to $93\%$ and $90\%$ to $95\%$ for range and range-angle profiles, respectively.
What makes a good pause? Investigating the turn-holding effects of fillers
Authors: Bing'er Jiang, Erik Ekstedt, Gabriel Skantze
Abstract
Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet. In this paper, we use the recently proposed Voice Activity Projection (VAP) model, which is a deep learning model trained to predict the dynamics of conversation, to analyse the effects of filled pauses on the expected turn-hold probability. The results show that, while filled pauses do indeed have a turn-holding effect, it is perhaps not as strong as could be expected, probably due to the redundancy of other cues. We also find that the prosodic properties and position of the filler has a significant effect on the turn-hold probability. However, contrary to what has been suggested in previous work, there is no difference between "uh" and "um" in this regard.
Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services
Authors: Payam Abdisarabshali, Nicholas Accurso, Filippo Malandra, Weifeng Su, Seyyedali Hosseinalipour
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Federated learning (FL) is the most popular distributed machine learning technique. However, implementation of FL over modern wireless networks faces key challenges caused by (i) dynamics of the network conditions, (ii) coexistence of multiple FL services/tasks in the system, and (iii) concurrent execution of FL services with other network services, which are not jointly considered in prior works. Motivated by these challenges, we introduce a generic FL paradigm over next-generation (NextG) networks, called dynamic multi-service FL (DMS-FL). We identify three unexplored design considerations in DMS-FL: (i) FL service operator accumulation, (ii) wireless resource fragmentation, and (iii) signal strength fluctuations. We take the first steps towards addressing these design considerations through proposing a novel distributed ML architecture called elastic virtualized FL (EV-FL). EV-FL unleashes the full potential of Open RAN (O-RAN) systems and introduces an elastic resource provisioning methodology to execute FL services. It further constitutes a multi-time-scale FL management system that introduces three dimensions into existing FL architectures: (i) virtualization, (ii) scalability, and (iii) elasticity. Through investigating EV-FL, we reveal a series of open research directions for future work. We finally simulate EV-FL to demonstrate its potential to save wireless resources and increase fairness among FL services.
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning
Authors: Zhen Wei, Pascal Fua, Michaël Bauerheim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Fluid Dynamics (physics.flu-dyn)
Abstract
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns, eliminating the need for further handcrafting. The Latent Space Model (LSM) learns a low-dimensional latent representation of an object from a dataset of various geometries, while the Direct Mapping Model (DMM) builds parameterization on the fly using only one geometry of interest. We also devise a novel regularization loss that efficiently integrates volumetric mesh deformation into the parameterization model. The models directly manipulate the high-dimensional mesh data by moving vertices. LSM and DMM are fully differentiable, enabling gradient-based, end-to-end pipeline design and plug-and-play deployment of surrogate models or adjoint solvers. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning
Abstract
Evolutionary science provides evidence that diversity confers resilience. Yet, traditional multi-agent reinforcement learning techniques commonly enforce homogeneity to increase training sample efficiency. When a system of learning agents is not constrained to homogeneous policies, individual agents may develop diverse behaviors, resulting in emergent complementarity that benefits the system. Despite this feat, there is a surprising lack of tools that measure behavioral diversity in systems of learning agents. Such techniques would pave the way towards understanding the impact of diversity in collective resilience and performance. In this paper, we introduce System Neural Diversity (SND): a measure of behavioral heterogeneity for multi-agent systems where agents have stochastic policies. %over a continuous state space. We discuss and prove its theoretical properties, and compare it with alternate, state-of-the-art behavioral diversity metrics used in cross-disciplinary domains. Through simulations of a variety of multi-agent tasks, we show how our metric constitutes an important diagnostic tool to analyze latent properties of behavioral heterogeneity. By comparing SND with task reward in static tasks, where the problem does not change during training, we show that it is key to understanding the effectiveness of heterogeneous vs homogeneous agents. In dynamic tasks, where the problem is affected by repeated disturbances during training, we show that heterogeneous agents are first able to learn specialized roles that allow them to cope with the disturbance, and then retain these roles when the disturbance is removed. SND allows a direct measurement of this latent resilience, while other proxies such as task performance (reward) fail to.
An identification method for oscillators with response-dependent inertia
Authors: Yuval Harduf (1), Eyal Setter (1), Izhak Bucher (1) ((1) Technion Israel Institute of Technology, Faculty of mechanical engineering)
Abstract
This paper is concerned with identifying the instantaneous modal parameters of oscillatory systems with response-dependent inertia (mass, inductance, or equivalent) based on their measured dynamics. An identification method is proposed, which is a variation of the "FORCEVIB" method. The method utilizes analytic signal representation and the properties of the Hilbert transform to obtain an analytic relationship between a system's natural frequency and damping coefficient to its response and excitation signals. The proposed method is validated by comparing the identification results to the asymptotic solution of a simple system with response-dependent inertia and is then demonstrated, numerically and experimentally, for other, more complicated, nonlinear systems.
A Curriculum View of Robust Loss Functions
Authors: Zebin Ou, Yue Zhang
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Abstract
Robust loss functions are designed to combat the adverse impacts of label noise, whose robustness is typically supported by theoretical bounds agnostic to the training dynamics. However, these bounds may fail to characterize the empirical performance as it remains unclear why robust loss functions can underfit. We show that most loss functions can be rewritten into a form with the same class-score margin and different sample-weighting functions. The resulting curriculum view provides a straightforward analysis of the training dynamics, which helps attribute underfitting to diminished average sample weights and noise robustness to larger weights for clean samples. We show that simple fixes to the curriculums can make underfitting robust loss functions competitive with the state-of-the-art, and training schedules can substantially affect the noise robustness even with robust loss functions. Code is available at \url{github}.
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami
Abstract
Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize this parameter inefficiency is a result of all experts having equal capacity, which may not adequately meet the varying complexity requirements of different tokens or tasks, e.g., in a multilingual setting, languages based on their resource levels might require different capacities. In light of this, we propose Stratified Mixture of Experts(SMoE) models, which feature a stratified structure and can assign dynamic capacity to different tokens. We demonstrate the effectiveness of SMoE on two multilingual machine translation benchmarks, where it outperforms multiple state-of-the-art MoE models. On a diverse 15-language dataset, SMoE improves the translation quality over vanilla MoE by +0.93 BLEU points on average. Additionally, SMoE is parameter-efficient, matching vanilla MoE performance with around 50\% fewer parameters.
A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments
Authors: Jason Gibson, Bogdan Vlahov, David Fan, Patrick Spieler, Daniel Pastor, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Modeling dynamics is often the first step to making a vehicle autonomous. While on-road autonomous vehicles have been extensively studied, off-road vehicles pose many challenging modeling problems. An off-road vehicle encounters highly complex and difficult-to-model terrain/vehicle interactions, as well as having complex vehicle dynamics of its own. These complexities can create challenges for effective high-speed control and planning. In this paper, we introduce a framework for multistep dynamics prediction that explicitly handles the accumulation of modeling error and remains scalable for sampling-based controllers. Our method uses a specially-initialized Long Short-Term Memory (LSTM) over a limited time horizon as the learned component in a hybrid model to predict the dynamics of a 4-person seating all-terrain vehicle (Polaris S4 1000 RZR) in two distinct environments. By only having the LSTM predict over a fixed time horizon, we negate the need for long term stability that is often a challenge when training recurrent neural networks. Our framework is flexible as it only requires odometry information for labels. Through extensive experimentation, we show that our method is able to predict millions of possible trajectories in real-time, with a time horizon of five seconds in challenging off road driving scenarios.
Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter
Authors: Zhe Xu, Tao Yan, Simon X. Yang, S. Andrew Gadsden
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
This paper investigated the distributed leader follower formation control problem for multiple differentially driven mobile robots. A distributed estimator is first introduced and it only requires the state information from each follower itself and its neighbors. Then, we propose a bioinspired neural dynamic based backstepping and sliding mode control hybrid formation control method with proof of its stability. The proposed control strategy resolves the impractical speed jump issue that exists in the conventional backstepping design. Additionally, considering the system and measurement noises, the proposed control strategy not only removes the chattering issue existing in the conventional sliding mode control but also provides smooth control input with extra robustness. After that, an adaptive sliding innovation filter is integrated with the proposed control to provide accurate state estimates that are robust to modeling uncertainties. Finally, we performed multiple simulations to demonstrate the efficiency and effectiveness of the proposed formation control strategy.
DynamicStereo: Consistent Dynamic Depth from Stereo Videos
Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. Our architecture is designed to process stereo videos efficiently through divided attention layers. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments, which provides complementary training and evaluation data for dynamic stereo closer to real applications than existing datasets. Training with this dataset further improves the quality of predictions of our proposed DynamicStereo as well as prior methods. Finally, it acts as a benchmark for consistent stereo methods.
Keyword: efficient
Physics-Informed and Data-Driven Discovery of Governing Equations for Complex Phenomena in Heterogeneous Media
Scalable Data Point Valuation in Decentralized Learning
FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Stars Are All You Need: A Distantly Supervised Pyramid Network for Document-Level End-to-End Sentiment Analysis
Cross-view Action Recognition via Contrastive View-invariant Representation
Connectivity Queries under Vertex Failures: Not Optimal, but Practical
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Fairly Allocating Goods and (Terrible) Chores
Characterizing Compositionality of LQR from the Categorical Perspective
Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems
Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach
Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
Prediction of Performance and Power Consumption of GPGPU Applications
Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey
Hybrid Active-Passive IRS Assisted Energy-Efficient Wireless Communication
Illicit item detection in X-ray images for security applications
Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer
Putting collective intelligence to the enforcement of the Digital Services Act
"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Computing paths of large rank in planar frameworks deterministically
Approximating Long Cycle Above Dirac's Guarantee
natural'' guarantees bringing to algorithmic questions whether a better solution (above the guarantee) could be obtained efficiently. The above guarantee paradigm has led to several exciting discoveries in the areas of parameterized algorithms and kernelization. We argue that this paradigm could bring forth fresh perspectives on well-studied problems in approximation algorithms. Our example is the longest cycle problem. One of the oldest results in extremal combinatorics is the celebrated Dirac's theorem from 1952. Dirac's theorem provides the following guarantee on the length of the longest cycle: for every 2-connected n-vertex graph G with minimum degree \delta(G)\leq n/2, the length of a longest cycle L is at least 2\delta(G). Thus, the
essential'' part in finding the longest cycle is in approximating the ``offset'' k = L - 2 \delta(G). The main result of this paper is the above-guarantee approximation theorem for k. Informally, the theorem says that approximating the offset k is not harder than approximating the total length L of a cycle. In other words, for any (reasonably well-behaved) function f, a polynomial time algorithm constructing a cycle of length f(L) in an undirected graph with a cycle of length L, yields a polynomial time algorithm constructing a cycle of length 2\delta(G)+\Omega(f(k)).Deep Learning-Based Multiband Signal Fusion for 3-D SAR Super-Resolution
Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model
Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique
Approximate Evaluation of Quantitative Second Order Queries
A survey of modularized backstepping control design approaches to nonlinear ODE systems
A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution
Rethinking the Encoding of Satellite Image Time Series
Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging
Heterogeneous GNN-RL Based Task Offloading for UAV-aided Smart Agriculture
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning
On the Channel Correlation in Reconfigurable Intelligent Surface-Aided System
An identification method for oscillators with response-dependent inertia
Learning-Augmented Online TSP on Rings, Trees, Flowers and (almost) Everywhere Else
Evanescent Plane Wave Approximation of Helmholtz Solutions in Spherical Domains
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Experiences with Remote Examination Formats in Light of GPT-4
Stream Efficient Learning
LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning
Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference
Multi-dimensional Signal Recovery using Low-rank Deconvolution
EFx Budget-Feasible Allocations with High Nash Welfare
DynamicStereo: Consistent Dynamic Depth from Stereo Videos
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
AG3D: Learning to Generate 3D Avatars from 2D Image Collections
Keyword: faster
Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots
A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems
Approximate Evaluation of Quantitative Second Order Queries
Removing Human Bottlenecks in Bird Classification Using Camera Trap Images and Deep Learning
Keyword: mobile
Probabilistic Formal Modelling to Uncover and Interpret Interaction Styles
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots
A Vision Transformer Approach for Efficient Near-Field Irregular SAR Super-Resolution
Efficient CNN-based Super Resolution Algorithms for mmWave Mobile Radar Imaging
Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter
Keyword: pruning
Computer-Vision Based Real Time Waypoint Generation for Autonomous Vineyard Navigation with Quadruped Robots
Bicubic++: Slim, Slimmer, Slimmest -- Designing an Industry-Grade Super-Resolution Network
Rethinking Graph Lottery Tickets: Graph Sparsity Matters
Keyword: voxel
There is no result
Keyword: lidar
Direct LiDAR-Inertial Odometry and Mapping: Perceptive and Connective SLAM
On procedural urban digital twin generation and visualization of large scale data
Keyword: diffusion
DiffuSum: Generation Enhanced Extractive Summarization with Diffusion
Multimodal Procedural Planning via Dual Text-Image Prompting
Unpaired Downscaling of Fluid Flows with Diffusion Bridges
Multimodal Data Augmentation for Image Captioning using Diffusion Models
The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly
DiffFacto Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
Deep Graph Representation Learning and Optimization for Influence Maximization
Keyword: dynamic
Physics-Informed and Data-Driven Discovery of Governing Equations for Complex Phenomena in Heterogeneous Media
Visual Reasoning: from State to Transformation
Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles
Fault Tolerant Processing Unit Using Gamma Distribution Sliding Window For Autonomous Landing Guidance System
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Bio-Inspired Simple Neural Network for Low-Light Image Restoration: A Minimalist Approach
The Impacts of Dimensionality, Diffusion, and Directedness on Intrinsic Cross-Model Simulation in Tile-Based Self-Assembly
Class adaptive threshold and negative class guided noisy annotation robust Facial Expression Recognition
Evolving Dictionary Representation for Few-shot Class-incremental Learning
PODTherm-GP: A Physics-based Data-Driven Approach for Effective Architecture-Level Thermal Simulation of Multi-Core CPUs
Optimal Resource Management for Hierarchical Federated Learning over HetNets with Wireless Energy Transfer
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images
Computing paths of large rank in planar frameworks deterministically
Gym-preCICE: Reinforcement Learning Environments for Active Flow Control
Improved Static Hand Gesture Classification on Deep Convolutional Neural Networks using Novel Sterile Training Technique
What makes a good pause? Investigating the turn-holding effects of fillers
Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning
System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning
An identification method for oscillators with response-dependent inertia
A Curriculum View of Robust Loss Functions
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments
Distributed Leader Follower Formation Control of Mobile Robots based on Bioinspired Neural Dynamics and Adaptive Sliding Innovation Filter
DynamicStereo: Consistent Dynamic Depth from Stereo Videos