New submissions for Wed, 12 Apr 23

Keyword: efficient

DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies

Authors: Eloghosa Ikponmwoba, Ope Owoyele
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.04751
Pdf link: https://arxiv.org/pdf/2304.04751
Abstract We present an approach for designing swarm-based optimizers for the global optimization of expensive black-box functions. In the proposed approach, the problem of finding efficient optimizers is framed as a reinforcement learning problem, where the goal is to find optimization policies that require a few function evaluations to converge to the global optimum. The state of each agent within the swarm is defined as its current position and function value within a design space and the agents learn to take favorable actions that maximize reward, which is based on the final value of the objective function. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies. Furthermore, the effect of changing the number of agents, as well as the generalization capabilities of the trained agents are investigated. The results show superior performance compared to the other optimizers, desired scaling when the number of agents is varied, and acceptable performance even when applied to unseen functions. On a broader scale, the results show promise for the rapid development of domain-specific optimizers.
A new perspective on building efficient and expressive 3D equivariant graph neural networks
Authors: Weitao Du, Yuanqi Du, Limei Wang, Dieqiao Feng, Guifeng Wang, Shuiwang Ji, Carla Gomes, Zhi-Ming Ma
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.04757
Pdf link: https://arxiv.org/pdf/2304.04757
Abstract Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these networks through a local-to-global analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (LSE) and frame transition encoding (FTE). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves state-of-the-art performance on both scalar-valued and vector-valued molecular property prediction tasks. We further point out the design space for future developments of equivariant graph neural networks. Our codes are available at \url{https://github.com/yuanqidu/LeftNet}.
An autoencoder compression approach for accelerating large-scale inverse problems
Authors: Jonathan Wittmer, Jacob Badger, Hari Sundar, Tan Bui-Thanh
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04781
Pdf link: https://arxiv.org/pdf/2304.04781
Abstract PDE-constrained inverse problems are some of the most challenging and computationally demanding problems in computational science today. Fine meshes that are required to accurately compute the PDE solution introduce an enormous number of parameters and require large scale computing resources such as more processors and more memory to solve such systems in a reasonable time. For inverse problems constrained by time dependent PDEs, the adjoint method that is often employed to efficiently compute gradients and higher order derivatives requires solving a time-reversed, so-called adjoint PDE that depends on the forward PDE solution at each timestep. This necessitates the storage of a high dimensional forward solution vector at every timestep. Such a procedure quickly exhausts the available memory resources. Several approaches that trade additional computation for reduced memory footprint have been proposed to mitigate the memory bottleneck, including checkpointing and compression strategies. In this work, we propose a close-to-ideal scalable compression approach using autoencoders to eliminate the need for checkpointing and substantial memory storage, thereby reducing both the time-to-solution and memory requirements. We compare our approach with checkpointing and an off-the-shelf compression approach on an earth-scale ill-posed seismic inverse problem. The results verify the expected close-to-ideal speedup for both the gradient and Hessian-vector product using the proposed autoencoder compression approach. To highlight the usefulness of the proposed approach, we combine the autoencoder compression with the data-informed active subspace (DIAS) prior to show how the DIAS method can be affordably extended to large scale problems without the need of checkpointing and large memory.
Revisiting Test Time Adaptation under Online Evaluation
Authors: Motasem Alfarra, Hani Itani, Alejandro Pardo, Shyma Alhuwaider, Merey Ramazanova, Juan C. Pérez, Zhipeng Cai, Matthias Müller, Bernard Ghanem
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04795
Pdf link: https://arxiv.org/pdf/2304.04795
Abstract This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. TTA methods leverage unlabeled data at test time to adapt to distribution shifts. Though many effective methods have been proposed, their impressive performance usually comes at the cost of significantly increased computation budgets. Current evaluation protocols overlook the effect of this extra computation cost, affecting their real-world applicability. To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments shows that, when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods. For example, SHOT from 2020 outperforms the state-of-the-art method SAR from 2023 under our online setting. Our online evaluation protocol emphasizes the need for developing TTA methods that are efficient and applicable in realistic settings.
Scallop: A Language for Neurosymbolic Programming
Authors: Ziyang Li, Jiani Huang, Mayur Naik
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04812
Pdf link: https://arxiv.org/pdf/2304.04812
Abstract We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability.
Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques
Authors: Lavanya Elluri, Varun Mandalapu, Piyush Vyas, Nirmalya Roy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04819
Pdf link: https://arxiv.org/pdf/2304.04819
Abstract Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.
Binary Latent Diffusion
Authors: Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04820
Pdf link: https://arxiv.org/pdf/2304.04820
Abstract In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
Exact Set-valued Estimation using Constrained Convex Generators for uncertain Linear Systems
Authors: Daniel Silvestre
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04826
Pdf link: https://arxiv.org/pdf/2304.04826
Abstract Set-valued state estimation when in the presence of uncertainties in the model have been addressed in the literature essentially following three main approaches: i) interval arithmetic of the uncertain dynamics with the estimates; ii) factorizing the uncertainty into matrices with unity rank; and, iii) performing the convex hull for the vertices of the uncertainty space. Approach i) and ii) introduce a lot of conservatism because both disregard the relationship of the parameters with the entries of the dynamics matrix. On the other hand, approach iii) has a large growth on the number of variables required to represent the set or is approximated losing its main advantage in comparison with i) and ii). In this paper, with the application of autonomous vehicles in GPS-denied areas that resort to beacon signals for localization, we develop an exact (meaning no added conservatism) and optimal (smallest growth in the number of variables) closed-form definition for the convex hull of Convex Constrained Generators (CCGs). This results in a more efficient method to represent the minimum volume convex set corresponding to the state estimation. Given that reductions methods are still lacking in the literature for CCGs, we employ an approximation using ray-shooting that is comparable in terms of accuracy with methods for Constrained Zonotopes as the ones implemented in CORA. Simulations illustrate the greater accuracy of CCGs with the proposed convex hull operation in comparison to Constrained Zonotopes.
A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital
Authors: Marcio G B de Avellar, Alexandre A S Junior, André H G Lopes, André L S Carneiro, João A Pereira, Davi C B D da Cunha
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.04833
Pdf link: https://arxiv.org/pdf/2304.04833
Abstract We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a promising solution to the technological challenges faced by CBDCs. Furthermore, by implementing confidential computing in DLTs such as Hyperledger Besu and utilizing frameworks such as CCF, we increase transaction confidentiality and privacy while maintaining the scalability and interoperability required for a global digital financial system. In conclusion, confidential computing can significantly bolster CBDC development, fostering a secure, private, and efficient financial future. -- Exploramos o uso da computa\c{c}\~ao confidencial no contexto das CBDCs utilizando o framework CCF da Microsoft como exemplo. Via desenvolvimento de experimentos e compara\c{c}\~ao de diferentes abordagens e m\'etricas de desempenho e seguran\c{c}a, buscamos avaliar a efic\'acia da computa\c{c}\~ao confidencial para melhorar a privacidade, seguran\c{c}a e desempenho das CBDCs. Resultados preliminares sugerem que a computa\c{c}\~ao confidencial pode ser uma solu\c{c}\~ao promissora para os desafios tecnol\'ogicos enfrentados pelas CBDCs. Ao implementar a computa\c{c}\~ao confidencial em DLTs, como o Hyperledger Besu, e utilizar frameworks como o CCF, aumentamos a confidencialidade e a privacidade das transa\c{c}\~oes, mantendo a escalabilidade e a interoperabilidade necess\'arias para um sistema financeiro global e digital. Em conclus\~ao, a computa\c{c}\~ao confidencial pode refor\c{c}ar significativamente o desenvolvimento do CBDC, promovendo um futuro financeiro seguro, privado e eficiente.
Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations
Authors: Jing Qin, Biyun Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04879
Pdf link: https://arxiv.org/pdf/2304.04879
Abstract Motion detection has been widely used in many applications, such as surveillance and robotics. Due to the presence of the static background, a motion video can be decomposed into a low-rank background and a sparse foreground. Many regularization techniques that preserve low-rankness of matrices can therefore be imposed on the background. In the meanwhile, geometry-based regularizations, such as graph regularizations, can be imposed on the foreground. Recently, weighted regularization techniques including the weighted nuclear norm regularization have been proposed in the image processing community to promote adaptive sparsity while achieving efficient performance. In this paper, we propose a robust dual graph regularized moving object detection model based on a novel weighted nuclear norm regularization and spatiotemporal graph Laplacians. Numerical experiments on realistic human motion data sets have demonstrated the effectiveness and robustness of this approach in separating moving objects from background, and the enormous potential in robotic applications.
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach
Authors: Bilal Ghanem, Alona Fyshe
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.04881
Pdf link: https://arxiv.org/pdf/2304.04881
Abstract Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of state-of-the-art DG models very differently from MT-based metrics, showing that MT metrics should not be used for distractor evaluation.
EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System
Authors: Yanlin Qi, Gengchen Mai, Rui Zhu, Michael Zhang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.04893
Pdf link: https://arxiv.org/pdf/2304.04893
Abstract Over the past decade, the electric vehicle industry has experienced unprecedented growth and diversification, resulting in a complex ecosystem. To effectively manage this multifaceted field, we present an EV-centric knowledge graph (EVKG) as a comprehensive, cross-domain, extensible, and open geospatial knowledge management system. The EVKG encapsulates essential EV-related knowledge, including EV adoption, electric vehicle supply equipment, and electricity transmission network, to support decision-making related to EV technology development, infrastructure planning, and policy-making by providing timely and accurate information and analysis. To enrich and contextualize the EVKG, we integrate the developed EV-relevant ontology modules from existing well-known knowledge graphs and ontologies. This integration enables interoperability with other knowledge graphs in the Linked Data Open Cloud, enhancing the EVKG's value as a knowledge hub for EV decision-making. Using six competency questions, we demonstrate how the EVKG can be used to answer various types of EV-related questions, providing critical insights into the EV ecosystem. Our EVKG provides an efficient and effective approach for managing the complex and diverse EV industry. By consolidating critical EV-related knowledge into a single, easily accessible resource, the EVKG supports decision-makers in making informed choices about EV technology development, infrastructure planning, and policy-making. As a flexible and extensible platform, the EVKG is capable of accommodating a wide range of data sources, enabling it to evolve alongside the rapidly changing EV landscape.
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT
Authors: Mingzhe Hu, Shaoyan Pan, Yuheng Li, Xiaofeng Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04920
Pdf link: https://arxiv.org/pdf/2304.04920
Abstract In this paper, we aimed to provide a review and tutorial for researchers in the field of medical imaging using language models to improve their tasks at hand. We began by providing an overview of the history and concepts of language models, with a special focus on large language models. We then reviewed the current literature on how language models are being used to improve medical imaging, emphasizing different applications such as image captioning, report generation, report classification, finding extraction, visual question answering, interpretable diagnosis, and more for various modalities and organs. The ChatGPT was specially highlighted for researchers to explore more potential applications. We covered the potential benefits of accurate and efficient language models for medical imaging analysis, including improving clinical workflow efficiency, reducing diagnostic errors, and assisting healthcare professionals in providing timely and accurate diagnoses. Overall, our goal was to bridge the gap between language models and medical imaging and inspire new ideas and innovations in this exciting area of research. We hope that this review paper will serve as a useful resource for researchers in this field and encourage further exploration of the possibilities of language models in medical imaging.
Model sparsification can simplify machine unlearning
Authors: Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04934
Pdf link: https://arxiv.org/pdf/2304.04934
Abstract Recent data regulations necessitate machine unlearning (MU): The removal of the effect of specific examples from the model. While exact unlearning is possible by conducting a model retraining with the remaining data from scratch, its computational cost has led to the development of approximate but efficient unlearning schemes. Beyond data-centric MU solutions, we advance MU through a novel model-based viewpoint: sparsification via weight pruning. Our results in both theory and practice indicate that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. With this insight, we develop two new sparsity-aware unlearning meta-schemes, termed prune first, then unlearn' andsparsity-aware unlearning'. Extensive experiments show that our findings and proposals consistently benefit MU in various scenarios, including class-wise data scrubbing, random data scrubbing, and backdoor data forgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest approximate unlearning methods) in the proposed sparsity-aware unlearning paradigm. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
Stress-hybrid virtual element method on quadrilateral meshes for compressible and nearly-incompressible linear elasticity
Authors: Alvin Chen, N. Sukumar
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.04941
Pdf link: https://arxiv.org/pdf/2304.04941
Abstract In this paper, we propose a robust low-order stabilization-free virtual element method on quadrilateral meshes for linear elasticity that is based on the stress-hybrid principle. We refer to this approach as the Stress-Hybrid Virtual Element Method (SH-VEM). In this method, the Hellinger$-$Reissner variational principle is adopted, wherein both the equilibrium equations and the strain-displacement relations are variationally enforced. We consider small-strain deformations of linear elastic solids in the compressible and near-incompressible regimes over quadrilateral (convex and nonconvex) meshes. Within an element, the displacement field is approximated as a linear combination of canonical shape functions that are $\textit{virtual}$. The stress field, similar to the stress-hybrid finite element method of Pian and Sumihara, is represented using a linear combination of symmetric tensor polynomials. A 5-parameter expansion of the stress field is used in each element, with stress transformation equations applied on distorted quadrilaterals. In the variational statement of the strain-displacement relations, the divergence theorem is invoked to express the stress coefficients in terms of the nodal displacements. This results in a formulation with solely the nodal displacements as unknowns. Numerical results are presented for several benchmark problems from linear elasticity. We show that SH-VEM is free of volumetric and shear locking, and it converges optimally in the $L^2$ norm and energy seminorm of the displacement field, and in the $L^2$ norm of the hydrostatic stress.
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Authors: Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.04947
Pdf link: https://arxiv.org/pdf/2304.04947
Abstract We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approach with moderate to no accuracy loss and the same parameter efficiency.
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Authors: Guanyi Qin, Runze Hu, Yutao Liu, Xiawu Zheng, Haotian Liu, Xiu Li, Yan Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.04952
Pdf link: https://arxiv.org/pdf/2304.04952
Abstract Blind Image Quality Assessment (BIQA) is a fundamental task in computer vision, which however remains unresolved due to the complex distortion conditions and diversified image contents. To confront this challenge, we in this paper propose a novel BIQA pipeline based on the Transformer architecture, which achieves an efficient quality-aware feature representation with much fewer data. More specifically, we consider the traditional fine-tuning in BIQA as an interpretation of the pre-trained model. In this way, we further introduce a Transformer decoder to refine the perceptual information of the CLS token from different perspectives. This enables our model to establish the quality-aware feature manifold efficiently while attaining a strong generalization capability. Meanwhile, inspired by the subjective evaluation behaviors of human, we introduce a novel attention panel mechanism, which improves the model performance and reduces the prediction uncertainty simultaneously. The proposed BIQA method maintains a lightweight design with only one layer of the decoder, yet extensive experiments on eight standard BIQA datasets (both synthetic and authentic) demonstrate its superior performance to the state-of-the-art BIQA methods, i.e., achieving the SRCC values of 0.875 (vs. 0.859 in LIVEC) and 0.980 (vs. 0.969 in LIVE).
AROW: A V2X-based Automated Right-of-Way Algorithm for Distributed Cooperative Intersection Management
Authors: Ghayoor Shah, Yaser P. Fallah, Danyang Tian, Ehsan Moradi-Pari
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.04958
Pdf link: https://arxiv.org/pdf/2304.04958
Abstract Safe and efficient intersection management is critical for an improved driving experience. As per several studies, an increasing number of crashes and fatalities occur every year at intersections. Most crashes are a consequence of a lack of situational awareness and ambiguity over intersection crossing priority. In this regard, research in Cooperative Intersection Management (CIM) is considered highly significant since it can utilize Vehicle-to-Everything (V2X) communication among Connected and Autonomous Vehicles (CAVs). CAVs can transceive basic and/or advanced safety information, thereby improving situational awareness at intersections. Although numerous studies have been performed on CIM, most of them are reliant on the presence of a Road-Side Unit (RSU) that can act as a centralized intersection manager and assign intersection crossing priorities. In the absence of RSU, there are some distributed CIM methods that only rely on communication among CAVs for situational awareness, however, none of them are specifically focused towards Stop Controlled-Intersection (SCI) with the aim of mitigating ambiguity among CAVs. Thus, we propose an Automated Right-of-Way (AROW) algorithm based on distributed CIM that is capable of reducing ambiguity and handling any level of noncompliance by CAVs. The algorithm is validated with extensive experiments for its functionality and robustness, and it outperforms the current solutions.
PlantDet: A benchmark for Plant Detection in the Three-Rivers-Source Region
Authors: Huanhuan Li, Xuechao Zou, Yu-an Zhang, Jiangcai Zhaba, Guomei Li, Lamao Yongga
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04963
Pdf link: https://arxiv.org/pdf/2304.04963
Abstract The Three-River-Source region is a highly significant natural reserve in China that harbors a plethora of untamed botanical resources. To meet the practical requirements of botanical research and intelligent plant management, we construct a large-scale dataset for Plant detection in the Three-River-Source region (PTRS). This dataset comprises 6965 high-resolution images of 2160*3840 pixels, captured by diverse sensors and platforms, and featuring objects of varying shapes and sizes. Subsequently, a team of botanical image interpretation experts annotated these images with 21 commonly occurring object categories. The fully annotated PTRS images contain 122, 300 instances of plant leaves, each labeled by a horizontal rectangle. The PTRS presents us with challenges such as dense occlusion, varying leaf resolutions, and high feature similarity among plants, prompting us to develop a novel object detection network named PlantDet. This network employs a window-based efficient self-attention module (ST block) to generate robust feature representation at multiple scales, improving the detection efficiency for small and densely-occluded objects. Our experimental results validate the efficacy of our proposed plant detection benchmark, with a precision of 88.1%, a mean average precision (mAP) of 77.6%, and a higher recall compared to the baseline. Additionally, our method effectively overcomes the issue of missing small objects. We intend to share our data and code with interested parties to advance further research in this field.
Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production
Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04966
Pdf link: https://arxiv.org/pdf/2304.04966
Abstract Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.
GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning
Authors: Cheng Xin, Soham Mukherjee, Shreyas N. Samaga, Tamal K. Dey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Algebraic Topology (math.AT)
Arxiv link: https://arxiv.org/abs/2304.04970
Pdf link: https://arxiv.org/pdf/2304.04970
Abstract $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape \textsc{Gril} for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules.
StageInteractor: Query-based Object Detector with Cross-stage Interaction
Authors: Yao Teng, Haisong Liu, Sheng Guo, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04978
Pdf link: https://arxiv.org/pdf/2304.04978
Abstract Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Authors: Guangyong Wei, Zhikui Duan, Shiren Li, Guangguang Yang, Xinmei Yu, Junhua Li
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.04991
Pdf link: https://arxiv.org/pdf/2304.04991
Abstract In recent years, a great deal of attention has been paid to the Transformer network for speech recognition tasks due to its excellent model performance. However, the Transformer network always involves heavy computation and large number of parameters, causing serious deployment problems in devices with limited computation sources or storage memory. In this paper, a new lightweight model called Sim-T has been proposed to expand the generality of the Transformer model. Under the help of the newly developed multiplexing technique, the Sim-T can efficiently compress the model with negligible sacrifice on its performance. To be more precise, the proposed technique includes two parts, that are, module weight multiplexing and attention score multiplexing. Moreover, a novel decoder structure has been proposed to facilitate the attention score multiplexing. Extensive experiments have been conducted to validate the effectiveness of Sim-T. In Aishell-1 dataset, when the proposed Sim-T is 48% parameter less than the baseline Transformer, 0.4% CER improvement can be obtained. Alternatively, 69% parameter reduction can be achieved if the Sim-T gives the same performance as the baseline Transformer. With regard to the HKUST and WSJ eval92 datasets, CER and WER will be improved by 0.3% and 0.2%, respectively, when parameters in Sim-T are 40% less than the baseline Transformer.
Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories
Authors: Fabrizio Ottati, Giovanna Turvani, Marco Vacca, Guido Masera
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2304.04995
Pdf link: https://arxiv.org/pdf/2304.04995
Abstract The speed of modern digital systems is severely limited by memory latency (the ``Memory Wall'' problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic--In--Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK $\SI{45}{\nano\meter}$ CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in--memory operations result to be very efficient: a 55.26\% reduction in the energy--delay product is measured for the AND operation with respect to the SRAM read one; therefore, the LiM approach represents a very promising solution for low--density and high--performance memories.
Bayes correlated equilibria and no-regret dynamics
Authors: Kaito Fujii
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05005
Pdf link: https://arxiv.org/pdf/2304.05005
Abstract This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes social welfare, as measured by the price of anarchy, for a broad class of games. These three properties allow players to compute an equilibrium and realize it via a mediator, thereby settling into a stable state with approximately optimal social welfare. Our main result is the existence of an equilibrium concept that satisfies these three properties. Toward this goal, we characterize various (non-equivalent) extensions of correlated equilibria, collectively known as Bayes correlated equilibria. In particular, we focus on communication equilibria (also known as coordination mechanisms), which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players. We show that if each player minimizes a variant of regret called untruthful swap regret in repeated play of Bayesian games, the empirical distribution of these dynamics converges to a communication equilibrium. We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant. As a result, by simulating the dynamics with our algorithm, we can efficiently compute an approximate communication equilibrium. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes Nash equilibria to equilibria obtained by the proposed dynamics.
Privacy Amplification via Shuffling: Unified, Simplified, and Tightened
Authors: Shaowei Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2304.05007
Pdf link: https://arxiv.org/pdf/2304.05007
Abstract In decentralized settings, the shuffle model of differential privacy has emerged as a promising alternative to the classical local model. Analyzing privacy amplification via shuffling is a critical component in both single-message and multi-message shuffle protocols. However, current methods used in these two areas are distinct and specific, making them less convenient for protocol designers and practitioners. In this work, we introduce variation-ratio reduction as a unified framework for privacy amplification analyses in the shuffle model. This framework utilizes total variation bounds of local messages and probability ratio bounds of other users' blanket messages, converting them to indistinguishable levels. Our results indicate that the framework yields tighter bounds for both single-message and multi-message encoders (e.g., with local DP, local metric DP, or general multi-message randomizers). Specifically, for a broad range of local randomizers having extremal probability design, our amplification bounds are precisely tight. We also demonstrate that variation-ratio reduction is well-suited for parallel composition in the shuffle model and results in stricter privacy accounting for common sampling-based local randomizers. Our experimental findings show that, compared to existing amplification bounds, our numerical amplification bounds can save up to $30\%$ of the budget for single-message protocols, $75\%$ of the budget for multi-message protocols, and $75\%$-$95\%$ of the budget for parallel composition. Additionally, our implementation for numerical amplification bounds has only $\tilde{O}(n)$ complexity and is highly efficient in practice, taking just $2$ minutes for $n=10^8$ users. The code for our implementation can be found at \url{https://github.com/wangsw/PrivacyAmplification}.
Habits and goals in synergy: a variational Bayesian framework for behavior
Authors: Dongqi Han, Kenji Doya, Dongsheng Li, Jun Tani
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05008
Pdf link: https://arxiv.org/pdf/2304.05008
Abstract How to behave efficiently and flexibly is a central problem for understanding biological agents and creating intelligent embodied AI. It has been well known that behavior can be classified as two types: reward-maximizing habitual behavior, which is fast while inflexible; and goal-directed behavior, which is flexible while slow. Conventionally, habitual and goal-directed behaviors are considered handled by two distinct systems in the brain. Here, we propose to bridge the gap between the two behaviors, drawing on the principles of variational Bayesian theory. We incorporate both behaviors in one framework by introducing a Bayesian latent variable called "intention". The habitual behavior is generated by using prior distribution of intention, which is goal-less; and the goal-directed behavior is generated by the posterior distribution of intention, which is conditioned on the goal. Building on this idea, we present a novel Bayesian framework for modeling behaviors. Our proposed framework enables skill sharing between the two kinds of behaviors, and by leveraging the idea of predictive coding, it enables an agent to seamlessly generalize from habitual to goal-directed behavior without requiring additional training. The proposed framework suggests a fresh perspective for cognitive science and embodied AI, highlighting the potential for greater integration between habitual and goal-directed behaviors.
Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection
Authors: Luoxuan Weng, Minfeng Zhu, Kam Kwai Wong, Shi Liu, Jiashun Sun, Hang Zhu, Dongming Han, Wei Chen
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.05011
Pdf link: https://arxiv.org/pdf/2304.05011
Abstract Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.
Human-machine cooperation for semantic feature listing
Authors: Kushin Mukherjee, Siddharth Suresh, Timothy T. Rogers
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05012
Pdf link: https://arxiv.org/pdf/2304.05012
Abstract Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexical-semantics from limited data with LLM-generated data to efficiently generate high-quality feature norms.
Scalable Real-Time Vehicle Deformation for Interactive Environments
Authors: Ben Kenwright
Subjects: Robotics (cs.RO); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.05045
Pdf link: https://arxiv.org/pdf/2304.05045
Abstract This paper proposes a real-time physically-based method for simulating vehicle deformation. Our system synthesizes vehicle deformation characteristics by considering a low-dimensional coupled vehicle body technique. We simulate the motion and crumbling behavior of vehicles smashing into rigid objects. We explain and demonstrate the combination of a reduced complexity non-linear finite element system that is scalable and computationally efficient. We use an explicit position-based integration scheme to improve simulation speeds, while remaining stable and preserving modeling accuracy. We show our approach using a variety of vehicle deformation test cases which were simulated in real-time.
Pointless Global Bundle Adjustment With Relative Motions Hessians
Authors: Ewelina Rupnik, Marc Pierrot-Deseilligny
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05118
Pdf link: https://arxiv.org/pdf/2304.05118
Abstract Bundle adjustment (BA) is the standard way to optimise camera poses and to produce sparse representations of a scene. However, as the number of camera poses and features grows, refinement through bundle adjustment becomes inefficient. Inspired by global motion averaging methods, we propose a new bundle adjustment objective which does not rely on image features' reprojection errors yet maintains precision on par with classical BA. Our method averages over relative motions while implicitly incorporating the contribution of the structure in the adjustment. To that end, we weight the objective function by local hessian matrices - a by-product of local bundle adjustments performed on relative motions (e.g., pairs or triplets) during the pose initialisation step. Such hessians are extremely rich as they encapsulate both the features' random errors and the geometric configuration between the cameras. These pieces of information propagated to the global frame help to guide the final optimisation in a more rigorous way. We argue that this approach is an upgraded version of the motion averaging approach and demonstrate its effectiveness on both photogrammetric datasets and computer vision benchmarks.
Accelerating Globally Optimal Consensus Maximization in Geometric Vision
Authors: Xinyue Zhang, Liangzu Peng, Wanting Xu, Laurent Kneip
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05156
Pdf link: https://arxiv.org/pdf/2304.05156
Abstract Branch-and-bound-based consensus maximization stands out due to its important ability of retrieving the globally optimal solution to outlier-affected geometric problems. However, while the discovery of such solutions caries high scientific value, its application in practical scenarios is often prohibited by its computational complexity growing exponentially as a function of the dimensionality of the problem at hand. In this work, we convey a novel, general technique that allows us to branch over an $n-1$ dimensional space for an n-dimensional problem. The remaining degree of freedom can be solved globally optimally within each bound calculation by applying the efficient interval stabbing technique. While each individual bound derivation is harder to compute owing to the additional need for solving a sorting problem, the reduced number of intervals and tighter bounds in practice lead to a significant reduction in the overall number of required iterations. Besides an abstract introduction of the approach, we present applications to three fundamental geometric computer vision problems: camera resectioning, relative camera pose estimation, and point set registration. Through our exhaustive tests, we demonstrate significant speed-up factors at times exceeding two orders of magnitude, thereby increasing the viability of globally optimal consensus maximizers in online application scenarios.
From research activities to institutional piloting: the challenges of modernizing interfaces and data interoperability
Authors: Sabine Tostain (IRD)
Subjects: Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2304.05180
Pdf link: https://arxiv.org/pdf/2304.05180
Abstract Research activities are generally observed and evaluated through the prism of their production and financial elements or team composition. In addition to standardized management indicators and bibliometrics, the French National Research Institute for Sustainable Development (IRD) has been building new indicators for the last ten years, based on the annual regulatory declarations of the Institute's researchers. Different quality management tools allow the evolution of the different interfaces. This source of data, more ''open'' and more ''useful'' through its integration into the Institute's information system, is adapted to the needs of the multi-year management of research at the IRD. The aim is twofold: (1) to make progress in the evaluation of research and in the mastery of information by all actors, (2) to enlighten as many actors as possible via more efficient digital circuits and tools. The purpose of this article is to explain how the IRD is changing the entire production chain and the indicators of researchers' activities to better map scientific activities.
TinyReptile: TinyML with Federated Meta-Learning
Authors: Haoyu Ren, Darko Anicic, Thomas A. Runkler
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2304.05201
Pdf link: https://arxiv.org/pdf/2304.05201
Abstract Tiny machine learning (TinyML) is a rapidly growing field aiming to democratize machine learning (ML) for resource-constrained microcontrollers (MCUs). Given the pervasiveness of these tiny devices, it is inherent to ask whether TinyML applications can benefit from aggregating their knowledge. Federated learning (FL) enables decentralized agents to jointly learn a global model without sharing sensitive local data. However, a common global model may not work for all devices due to the complexity of the actual deployment environment and the heterogeneity of the data available on each device. In addition, the deployment of TinyML hardware has significant computational and communication constraints, which traditional ML fails to address. Considering these challenges, we propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning, to collaboratively learn a solid initialization for a neural network (NN) across tiny devices that can be quickly adapted to a new device with respect to its data. We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM. The evaluations on various TinyML use cases confirm a resource reduction and training time saving by at least two factors compared with baseline algorithms with comparable performance.
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Authors: Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.05216
Pdf link: https://arxiv.org/pdf/2304.05216
Abstract Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.
Inhomogeneous graph trend filtering via a l2,0 cardinality penalty
Authors: Xiaoqing Huang, Andersen Ang, Jie Zhang, Yijie Wang
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.05223
Pdf link: https://arxiv.org/pdf/2304.05223
Abstract We study estimation of piecewise smooth signals over a graph. We propose a $\ell_{2,0}$-norm penalized Graph Trend Filtering (GTF) model to estimate piecewise smooth graph signals that exhibits inhomogeneous levels of smoothness across the nodes. We prove that the proposed GTF model is simultaneously a k-means clustering on the signal over the nodes and a minimum graph cut on the edges of the graph, where the clustering and the cut share the same assignment matrix. We propose two methods to solve the proposed GTF model: a spectral decomposition method and a method based on simulated annealing. In the experiment on synthetic and real-world datasets, we show that the proposed GTF model has a better performances compared with existing approaches on the tasks of denoising, support recovery and semi-supervised classification. We also show that the proposed GTF model can be solved more efficiently than existing models for the dataset with a large edge set.
OpenAL: Evaluation and Interpretation of Active Learning Strategies
Authors: W. Jonas, A. Abraham, L. Dreyfus-Schmidt
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.05246
Pdf link: https://arxiv.org/pdf/2304.05246
Abstract Despite the vast body of literature on Active Learning (AL), there is no comprehensive and open benchmark allowing for efficient and simple comparison of proposed samplers. Additionally, the variability in experimental settings across the literature makes it difficult to choose a sampling strategy, which is critical due to the one-off nature of AL experiments. To address those limitations, we introduce OpenAL, a flexible and open-source framework to easily run and compare sampling AL strategies on a collection of realistic tasks. The proposed benchmark is augmented with interpretability metrics and statistical analysis methods to understand when and why some samplers outperform others. Last but not least, practitioners can easily extend the benchmark by submitting their own AL samplers.
Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning
Authors: Gwen Legate, Lucas Caccia, Eugene Belilovsky
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05260
Pdf link: https://arxiv.org/pdf/2304.05260
Abstract In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objective, diverging from the global solution. We demonstrate that individual client models experience a catastrophic forgetting with respect to data from other clients and propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss. This approach shields classes outside a client's label set from abrupt representation change and we empirically demonstrate it can alleviate client forgetting and provide consistent improvements to standard federated learning algorithms. Our method is particularly beneficial under the most challenging federated learning settings where data heterogeneity is high and client participation in each round is low.
Controllable Textual Inversion for Personalized Text-to-Image Generation
Authors: Jianan Yang, Haobo Wang, Ruixuan Xiao, Sai Wu, Gang Chen, Junbo Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05265
Pdf link: https://arxiv.org/pdf/2304.05265
Abstract The recent large-scale generative modeling has attained unprecedented performance especially in producing high-fidelity images driven by text prompts. Text inversion (TI), alongside the text-to-image model backbones, is proposed as an effective technique in personalizing the generation when the prompts contain user-defined, unseen or long-tail concept tokens. Despite that, we find and show that the deployment of TI remains full of "dark-magics" -- to name a few, the harsh requirement of additional datasets, arduous human efforts in the loop and lack of robustness. In this work, we propose a much-enhanced version of TI, dubbed Controllable Textual Inversion (COTI), in resolving all the aforementioned problems and in turn delivering a robust, data-efficient and easy-to-use framework. The core to COTI is a theoretically-guided loss objective instantiated with a comprehensive and novel weighted scoring mechanism, encapsulated by an active-learning paradigm. The extensive results show that COTI significantly outperforms the prior TI-related approaches with a 26.05 decrease in the FID score and a 23.00% boost in the R-precision.
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning
Authors: Wenjin Wang, Yunqing Hu, Qianglong Chen, Yin Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05288
Pdf link: https://arxiv.org/pdf/2304.05288
Abstract Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Authors: Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang, Fei Huang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.05302
Pdf link: https://arxiv.org/pdf/2304.05302
Abstract Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and these models. InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). PPO, however, is sensitive to hyperparameters and requires a minimum of four models in its standard implementation, which makes it hard to train. In contrast, we propose a novel learning paradigm called RRHF, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. RRHF can efficiently align language model output probabilities with human preferences as robust as fine-tuning and it only needs 1 to 2 models during tuning. In addition, RRHF can be considered an extension of SFT and reward models while being simpler than PPO in terms of coding, model counts, and hyperparameters. The entire alignment process can be accomplished within a single RRHF training session. We evaluate RRHF using LLaMA and Alpaca on Helpful and Harmless data, demonstrating performance comparable to PPO.
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05316
Pdf link: https://arxiv.org/pdf/2304.05316
Abstract The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.
SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications
Authors: Christof Bless, Ildar Baimuratov, Oliver Karras
Subjects: Digital Libraries (cs.DL); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.05327
Pdf link: https://arxiv.org/pdf/2304.05327
Abstract Scientific knowledge graphs have been proposed as a solution to structure the content of research publications in a machine-actionable way and enable more efficient, computer-assisted workflows for many research activities. Crowd-sourcing approaches are used frequently to build and maintain such scientific knowledge graphs. To contribute to scientific knowledge graphs, researchers need simple and easy-to-use solutions to generate new knowledge graph elements and establish the practice of semantic representations in scientific communication. In this paper, we present a workflow for authors of scientific documents to specify their contributions with a LaTeX package, called SciKGTeX, and upload them to a scientific knowledge graph. The SciKGTeX package allows authors of scientific publications to mark the main contributions of their work directly in LaTeX source files. The package embeds marked contributions as metadata into the generated PDF document, from where they can be extracted automatically and imported into a scientific knowledge graph, such as the ORKG. This workflow is simpler and faster than current approaches, which make use of external web interfaces for data entry. Our user evaluation shows that SciKGTeX is easy to use, with a score of 79 out of 100 on the System Usability Scale, as participants of the study needed only 7 minutes on average to annotate the main contributions on a sample abstract of a published paper. Further testing shows that the embedded contributions can be successfully uploaded to ORKG within ten seconds. SciKGTeX simplifies the process of manual semantic annotation of research contributions in scientific articles. Our workflow demonstrates how a scientific knowledge graph can automatically ingest research contributions from document metadata.
TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain
Authors: Alexey I. Boyko, Anastasiia Kornilova, Rahim Tariverdizadeh, Mirfarid Musavian, Larisa Markeeva, Ivan Oseledets, Gonzalo Ferrer
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05342
Pdf link: https://arxiv.org/pdf/2304.05342
Abstract This paper addresses the following research question: ``can one compress a detailed 3D representation and use it directly for point cloud registration?''. Map compression of the scene can be achieved by the tensor train (TT) decomposition of the signed distance function (SDF) representation. It regulates the amount of data reduced by the so-called TT-ranks. Using this representation we have proposed an algorithm, the TT-SDF2PC, that is capable of directly registering a PC to the compressed SDF by making use of efficient calculations of its derivatives in the TT domain, saving computations and memory. We compare TT-SDF2PC with SOTA local and global registration methods in a synthetic dataset and a real dataset and show on par performance while requiring significantly less resources.
Leo: Lagrange Elementary Optimization
Authors: Aso M. Aladdin, Tarik A. Rashid
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2304.05346
Pdf link: https://arxiv.org/pdf/2304.05346
Abstract Global optimization problems are frequently solved using the practical and efficient method of evolutionary sophistication. But as the original problem becomes more complex, so does its efficacy and expandability. Thus, the purpose of this research is to introduce the Lagrange Elementary Optimization (Leo) as an evolutionary method, which is self-adaptive inspired by the remarkable accuracy of vaccinations using the albumin quotient of human blood. They develop intelligent agents using their fitness function value after gene crossing. These genes direct the search agents during both exploration and exploitation. The main objective of the Leo algorithm is presented in this paper along with the inspiration and motivation for the concept. To demonstrate its precision, the proposed algorithm is validated against a variety of test functions, including 19 traditional benchmark functions and the CECC06 2019 test functions. The results of Leo for 19 classic benchmark test functions are evaluated against DA, PSO, and GA separately, and then two other recent algorithms such as FDO and LPB are also included in the evaluation. In addition, the Leo is tested by ten functions on CECC06 2019 with DA, WOA, SSA, FDO, LPB, and FOX algorithms distinctly. The cumulative outcomes demonstrate Leo's capacity to increase the starting population and move toward the global optimum. Different standard measurements are used to verify and prove the stability of Leo in both the exploration and exploitation phases. Moreover, Statistical analysis supports the findings results of the proposed research. Finally, novel applications in the real world are introduced to demonstrate the practicality of Leo.
Astroformer: More Data Might Not be All You Need for Classification
Authors: Rishit Dagli
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05350
Pdf link: https://arxiv.org/pdf/2304.05350
Abstract Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies form and evolve. Efficient methods to classify galaxy morphologies are required to extract physical information from modern-day astronomy surveys. In this paper, we introduce methods to learn from less amounts of data. We propose using a hybrid transformer-convolutional architecture drawing much inspiration from the success of CoAtNet and MaxViT. Concretely, we use the transformer-convolutional hybrid with a new stack design for the network, a different way of creating a relative self-attention layer, and pair it with a careful selection of data augmentation and regularization techniques. Our approach sets a new state-of-the-art on predicting galaxy morphologies from images on the Galaxy10 DECals dataset, a science objective, which consists of 17736 labeled images achieving $94.86\%$ top-$1$ accuracy, beating the current state-of-the-art for this task by $4.62\%$. Furthermore, this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. We also find that models and training methods used for larger datasets would often not work very well in the low-data regime. Our code and models will be released at a later date before the conference.
Asymmetric Polynomial Loss For Multi-Label Classification
Authors: Yusheng Huang, Jiexing Qi, Xinbing Wang, Zhouhan Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05361
Pdf link: https://arxiv.org/pdf/2304.05361
Abstract Various tasks are reformulated as multi-label classification problems, in which the binary cross-entropy (BCE) loss is frequently utilized for optimizing well-designed models. However, the vanilla BCE loss cannot be tailored for diverse tasks, resulting in a suboptimal performance for different models. Besides, the imbalance between redundant negative samples and rare positive samples could degrade the model performance. In this paper, we propose an effective Asymmetric Polynomial Loss (APL) to mitigate the above issues. Specifically, we first perform Taylor expansion on BCE loss. Then we ameliorate the coefficients of polynomial functions. We further employ the asymmetric focusing mechanism to decouple the gradient contribution from the negative and positive samples. Moreover, we validate that the polynomial coefficients can recalibrate the asymmetric focusing hyperparameters. Experiments on relation extraction, text classification, and image classification show that our APL loss can consistently improve performance without extra training burden.
Design and Analysis of Index codes for 3-Group NOMA in Vehicular Adhoc Networks
Authors: Sai Pavan Deekshitula, B. Sundar Rajan
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2304.05379
Pdf link: https://arxiv.org/pdf/2304.05379
Abstract Index coding (IC) is a source coding technique employed to improve spectral utilisation, where the source node aims to satisfy users' demands by making minimum transmissions. Non-orthogonal multiple access (NOMA) is integral to the radio access technique used in 5G networks. Index-coded NOMA (IC-NOMA) transmission scheme in Vehicular Adhoc Networks (VANETs) involves applying NOMA principles on index-coded data to avoid network congestion and to improve spectral efficiency compared to conventional IC systems. In this work, a spectral efficient transmission scheme called 3-Group IC-NOMA is proposed, and an innovative index code design that fits with NOMA decoding principles to obtain improved spectral efficiency is developed. Through exhaustive analytical studies, we demonstrate that the proposed transmission scheme always supports higher rates than the conventional IC systems and requires less power to achieve an information rate at least as good as conventional IC systems.
Keyword: faster

Similarity search in the blink of an eye with compressed indices
Authors: Cecilia Aguerrebere, Ishwar Bhati, Mark Hildebrand, Mariano Tepper, Ted Willke
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.04759
Pdf link: https://arxiv.org/pdf/2304.04759
Abstract Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem of relevance for a wide range of applications. In this work, we present new techniques for creating faster and smaller indices to run these searches. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that simultaneously reduces memory footprint and improves search performance, with minimal impact on search accuracy. LVQ is designed to work optimally in conjunction with graph-based indices, reducing their effective bandwidth while enabling random-access-friendly fast similarity computations. Our experimental results show that LVQ, combined with key optimizations for graph-based indices in modern datacenter systems, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Authors: Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04797
Pdf link: https://arxiv.org/pdf/2304.04797
Abstract Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.
An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs
Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam
Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2304.04876
Pdf link: https://arxiv.org/pdf/2304.04876
Abstract The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the solution of sparse linear systems arising from the discretization of a wide range of partial different equations. In this paper, we present FROSch (Fast and Robust Schwarz), a domain decomposition solver package which implements GDSW-type preconditioners for both CPU and GPU clusters. To improve the solver performance on GPUs, we use a novel decomposition to run multiple MPI processes on each GPU, reducing both solver's computational and storage costs and potentially improving the convergence rate. This allowed us to obtain competitive or faster performance using GPUs compared to using CPUs alone. We demonstrate the performance of FROSch on the Summit supercomputer with NVIDIA V100 GPUs, where we used NVIDIA Multi-Process Service (MPS) to implement our decomposition strategy. The solver has a wide variety of algorithmic and implementation choices, which poses both opportunities and challenges for its GPU implementation. We conduct a thorough experimental study with different solver options including the exact or inexact solution of the local overlapping subdomain problems on a GPU. We also discuss the effect of using the iterative variant of the incomplete LU factorization and sparse-triangular solve as the approximate local solver, and using lower precision for computing the whole FROSch preconditioner. Overall, the solve time was reduced by factors of about $2\times$ using GPUs, while the GPU acceleration of the numerical setup time depend on the solver options and the local matrix sizes.
Multi-Sample Consensus Driven Unsupervised Normal Estimation for 3D Point Clouds
Authors: Jie Zhang, Minghui Nie, Junjie Cao, Jian Liu, Ligang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04884
Pdf link: https://arxiv.org/pdf/2304.04884
Abstract Deep normal estimators have made great strides on synthetic benchmarks. Unfortunately, their performance dramatically drops on the real scan data since they are supervised only on synthetic datasets. The point-wise annotation of ground truth normals is vulnerable to inefficiency and inaccuracies, which totally makes it impossible to build perfect real datasets for supervised deep learning. To overcome the challenge, we propose a multi-sample consensus paradigm for unsupervised normal estimation. The paradigm consists of multi-candidate sampling, candidate rejection, and mode determination. The latter two are driven by neighbor point consensus and candidate consensus respectively. Two primary implementations of the paradigm, MSUNE and MSUNE-Net, are proposed. MSUNE minimizes a candidate consensus loss in mode determination. As a robust optimization method, it outperforms the cutting-edge supervised deep learning methods on real data at the cost of longer runtime for sampling enough candidate normals for each query point. MSUNE-Net, the first unsupervised deep normal estimator as far as we know, significantly promotes the multi-sample consensus further. It transfers the three online stages of MSUNE to offline training. Thereby its inference time is 100 times faster. Besides that, more accurate inference is achieved, since the candidates of query points from similar patches can form a sufficiently large candidate set implicitly in MSUNE-Net. Comprehensive experiments demonstrate that the two proposed unsupervised methods are noticeably superior to some supervised deep normal estimators on the most common synthetic dataset. More importantly, they show better generalization ability and outperform all the SOTA conventional and deep methods on three real datasets: NYUV2, KITTI, and a dataset from PCV [1].
Neural Network Predicts Ion Concentration Profiles under Nanoconfinement
Authors: Zhonglin Cao, Yuyang Wang, Cooper Lorsung, Amir Barati Farimani
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Arxiv link: https://arxiv.org/abs/2304.04896
Pdf link: https://arxiv.org/pdf/2304.04896
Abstract Modeling the ion concentration profile in nanochannel plays an important role in understanding the electrical double layer and electroosmotic flow. Due to the non-negligible surface interaction and the effect of discrete solvent molecules, molecular dynamics (MD) simulation is often used as an essential tool to study the behavior of ions under nanoconfinement. Despite the accuracy of MD simulation in modeling nanoconfinement systems, it is computationally expensive. In this work, we propose neural network to predict ion concentration profiles in nanochannels with different configurations, including channel widths, ion molarity, and ion types. By modeling the ion concentration profile as a probability distribution, our neural network can serve as a much faster surrogate model for MD simulation with high accuracy. We further demonstrate the superior prediction accuracy of neural network over XGBoost. Lastly, we demonstrated that neural network is flexible in predicting ion concentration profiles with different bin sizes. Overall, our deep learning model is a fast, flexible, and accurate surrogate model to predict ion concentration profiles in nanoconfinement.
Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production
Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04966
Pdf link: https://arxiv.org/pdf/2304.04966
Abstract Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.
Fast IMU-based Dual Estimation of Human Motion and Kinematic Parameters via Progressive In-Network Computing
Authors: Xiaobing Dai, Huanzhuo Wu, Siyi Wang, Junjie Jiao, Giang T. Nguyen, Frank H. P. Fitzek, Sandra Hirche
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05131
Pdf link: https://arxiv.org/pdf/2304.05131
Abstract Many applications involve humans in the loop, where continuous and accurate human motion monitoring provides valuable information for safe and intuitive human-machine interaction. Portable devices such as inertial measurement units (IMUs) are applicable to monitor human motions, while in practice often limited computational power is available locally. The human motion in task space coordinates requires not only the human joint motion but also the nonlinear coordinate transformation depending on the parameters such as human limb length. In most applications, measuring these kinematics parameters for each individual requires undesirably high effort. Therefore, it is desirable to estimate both, the human motion and kinematic parameters from IMUs. In this work, we propose a novel computational framework for dual estimation in real-time exploiting in-network computational resources. We adopt the concept of field Kalman filtering, where the dual estimation problem is decomposed into a fast state estimation process and a computationally expensive parameter estimation process. In order to further accelerate the convergence, the parameter estimation is progressively computed on multiple networked computational nodes. The superiority of our proposed method is demonstrated by a simulation of a human arm, where the estimation accuracy is shown to converge faster than with conventional approaches.
PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices
Authors: Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05152
Pdf link: https://arxiv.org/pdf/2304.05152
Abstract The success of transformers in computer vision has led to several attempts to adapt them for mobile devices, but their performance remains unsatisfactory in some real-world applications. To address this issue, we propose PP-MobileSeg, a semantic segmentation model that achieves state-of-the-art performance on mobile devices. PP-MobileSeg comprises three novel parts: the StrideFormer backbone, the Aggregated Attention Module (AAM), and the Valid Interpolate Module (VIM). The four-stage StrideFormer backbone is built with MV3 blocks and strided SEA attention, and it is able to extract rich semantic and detailed features with minimal parameter overhead. The AAM first filters the detailed features through semantic feature ensemble voting and then combines them with semantic features to enhance the semantic information. Furthermore, we proposed VIM to upsample the downsampled feature to the resolution of the input image. It significantly reduces model latency by only interpolating classes present in the final prediction, which is the most significant contributor to overall model latency. Extensive experiments show that PP-MobileSeg achieves a superior tradeoff between accuracy, model size, and latency compared to other methods. On the ADE20K dataset, PP-MobileSeg achieves 1.57% higher accuracy in mIoU than SeaFormer-Base with 32.9% fewer parameters and 42.3% faster acceleration on Qualcomm Snapdragon 855. Source codes are available at https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.8.
flap: A Deterministic Parser with Fused Lexing
Authors: Neel Krishnaswami, Ningning Xie, Jeremy Yallop
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2304.05276
Pdf link: https://arxiv.org/pdf/2304.05276
Abstract Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost. We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training
Authors: William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Swati Gupta, Tushar Krishna
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05301
Pdf link: https://arxiv.org/pdf/2304.05301
Abstract Collective communications are an indispensable part of distributed training. Running a topology-aware collective algorithm is crucial for optimizing communication performance by minimizing congestion. Today such algorithms only exist for a small set of simple topologies, limiting the topologies employed in training clusters and handling irregular topologies due to network failures. In this paper, we propose TACOS, an automated topology-aware collective synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x faster All-Reduce algorithm over baselines, and synthesized collective algorithms for 512-NPU system in just 6.1 minutes.
SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications
Authors: Christof Bless, Ildar Baimuratov, Oliver Karras
Subjects: Digital Libraries (cs.DL); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2304.05327
Pdf link: https://arxiv.org/pdf/2304.05327
Abstract Scientific knowledge graphs have been proposed as a solution to structure the content of research publications in a machine-actionable way and enable more efficient, computer-assisted workflows for many research activities. Crowd-sourcing approaches are used frequently to build and maintain such scientific knowledge graphs. To contribute to scientific knowledge graphs, researchers need simple and easy-to-use solutions to generate new knowledge graph elements and establish the practice of semantic representations in scientific communication. In this paper, we present a workflow for authors of scientific documents to specify their contributions with a LaTeX package, called SciKGTeX, and upload them to a scientific knowledge graph. The SciKGTeX package allows authors of scientific publications to mark the main contributions of their work directly in LaTeX source files. The package embeds marked contributions as metadata into the generated PDF document, from where they can be extracted automatically and imported into a scientific knowledge graph, such as the ORKG. This workflow is simpler and faster than current approaches, which make use of external web interfaces for data entry. Our user evaluation shows that SciKGTeX is easy to use, with a score of 79 out of 100 on the System Usability Scale, as participants of the study needed only 7 minutes on average to annotate the main contributions on a sample abstract of a published paper. Further testing shows that the embedded contributions can be successfully uploaded to ORKG within ten seconds. SciKGTeX simplifies the process of manual semantic annotation of research contributions in scientific articles. Our workflow demonstrates how a scientific knowledge graph can automatically ingest research contributions from document metadata.
Keyword: mobile

Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People
Authors: Kavya Puthuveetil, Sasha Wald, Atharva Pusalkar, Pratyusha Karnati, Zackory Erickson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.04822
Pdf link: https://arxiv.org/pdf/2304.04822
Abstract Robotic caregivers could potentially improve the quality of life of many who require physical assistance. However, in order to assist individuals who are lying in bed, robots must be capable of dealing with a significant obstacle: the blanket or sheet that will almost always cover the person's body. We propose a method for targeted bedding manipulation over people lying supine in bed where we first learn a model of the cloth's dynamics. Then, we optimize over this model to uncover a given target limb using information about human body shape and pose that only needs to be provided at run-time. We show how this approach enables greater robustness to variation relative to geometric and reinforcement learning baselines via a number of generalization evaluations in simulation and in the real world. We further evaluate our approach in a human study with 12 participants where we demonstrate that a mobile manipulator can adapt to real variation in human body shape, size, pose, and blanket configuration to uncover target body parts without exposing the rest of the body. Source code and supplementary materials are available online.
MHfit: Mobile Health Data for Predicting Athletics Fitness Using Machine Learning
Authors: Jonayet Miah, Muntasir mamun, Md Minhazur Rahman, Md Ishtyaq Mahmyd, Asm Mohaimenul Islam, Sabbir Ahmed
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.04839
Pdf link: https://arxiv.org/pdf/2304.04839
Abstract Mobile phones and other electronic gadgets or devices have aided in collecting data without the need for data entry. This paper will specifically focus on Mobile health data. Mobile health data use mobile devices to gather clinical health data and track patient vitals in real-time. Our study is aimed to give decisions for small or big sports teams on whether one athlete good fit or not for a particular game with the compare several machine learning algorithms to predict human behavior and health using the data collected from mobile devices and sensors placed on patients. In this study, we have obtained the dataset from a similar study done on mhealth. The dataset contains vital signs recordings of ten volunteers from different backgrounds. They had to perform several physical activities with a sensor placed on their bodies. Our study used 5 machine learning algorithms (XGBoost, Naive Bayes, Decision Tree, Random Forest, and Logistic Regression) to analyze and predict human health behavior. XGBoost performed better compared to the other machine learning algorithms and achieved 95.2% accuracy, 99.5% in sensitivity, 99.5% in specificity, and 99.66% in F1 score. Our research indicated a promising future in mhealth being used to predict human behavior and further research and exploration need to be done for it to be available for commercial use specifically in the sports industry.
Bounding Box Annotation with Visible Status
Authors: Takuya Kiyokawa, Naoki Shirakura, Hiroki Katayama, Keita Tomochika, Jun Takamatsu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2304.04901
Pdf link: https://arxiv.org/pdf/2304.04901
Abstract Training deep-learning-based vision systems requires the manual annotation of a significant amount of data to optimize several parameters of the deep convolutional neural networks. Such manual annotation is highly time-consuming and labor-intensive. To reduce this burden, a previous study presented a fully automated annotation approach that does not require any manual intervention. The proposed method associates a visual marker with an object and captures it in the same image. However, because the previous method relied on moving the object within the capturing range using a fixed-point camera, the collected image dataset was limited in terms of capturing viewpoints. To overcome this limitation, this study presents a mobile application-based free-viewpoint image-capturing method. With the proposed application, users can collect multi-view image datasets automatically that are annotated with bounding boxes by moving the camera. However, capturing images through human involvement is laborious and monotonous. Therefore, we propose gamified application features to track the progress of the collection status. Our experiments demonstrated that using the gamified mobile application for bounding box annotation, with visible collection progress status, can motivate users to collect multi-view object image datasets with less mental workload and time pressure in an enjoyable manner, leading to increased engagement.
Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production
Authors: Francisco Eron, Muhammad Noman, Raphael Ricon de Oliveira, Deigo de Souza Marques, Rafael Serapilha Durelli, Andre Pimenta Freire, Antonio Chalfun Junior
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04966
Pdf link: https://arxiv.org/pdf/2304.04966
Abstract Coffee which is prepared from the grinded roasted seeds of harvested coffee cherries, is one of the most consumed beverage and traded commodity, globally. To manually monitor the coffee field regularly, and inform about plant and soil health, as well as estimate yield and harvesting time, is labor-intensive, time-consuming and error-prone. Some recent studies have developed sensors for estimating coffee yield at the time of harvest, however a more inclusive and applicable technology to remotely monitor multiple parameters of the field and estimate coffee yield and quality even at pre-harvest stage, was missing. Following precision agriculture approach, we employed machine learning algorithm YOLO, for image processing of coffee plant. In this study, the latest version of the state-of-the-art algorithm YOLOv7 was trained with 324 annotated images followed by its evaluation with 82 unannotated images as test data. Next, as an innovative approach for annotating the training data, we trained K-means models which led to machine-generated color classes of coffee fruit and could thus characterize the informed objects in the image. Finally, we attempted to develop an AI-based handy mobile application which would not only efficiently predict harvest time, estimate coffee yield and quality, but also inform about plant health. Resultantly, the developed model efficiently analyzed the test data with a mean average precision of 0.89. Strikingly, our innovative semi-supervised method with an mean average precision of 0.77 for multi-class mode surpassed the supervised method with mean average precision of only 0.60, leading to faster and more accurate annotation. The mobile application we designed based on the developed code, was named CoffeApp, which possesses multiple features of analyzing fruit from the image taken by phone camera with in field and can thus track fruit ripening in real time.
Measuring Teachers' Visual Expertise Using the Gaze Relational Index Based on Real-world Eye-tracking Data and Varying Velocity Thresholds
Authors: Christian Kosel (1), Angelina Mooseder (2), Tina Seidl (1), Juergen Pfeffer (2) ((1) Friedl Schoeller Endowed Chair for Educational Psychology, School of Social Science and Technology, Technical University Munich, Germany, (2) Computational Social Science and Big Data, School of Social Science and Technology, Technical University Munich, Germany)
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2304.05143
Pdf link: https://arxiv.org/pdf/2304.05143
Abstract This article adds to the understanding of teachers' visual expertise by measuring visual information processing in real-world classrooms (mobile eye-tracking) with the newly introduced Gaze Relational Index (GRI) metric, which is defined as the ratio of mean fixation duration to mean fixation number. In addition, the aim was to provide a methodological contribution to future research by showing to what extent the selected configurations (i.e. varying velocity thresholds and fixation merging) of the eye movement event detection algorithm for detecting fixations and saccades influence the results of eye-tracking studies. Our study leads to two important take-home messages: First, by following a novice-expert paradigm (2 novice teachers & 2 experienced teachers), we found that the GRI can serve as a sensitive measure of visual expertise. As hypothesized, experienced teachers' GRI was lower, suggesting that their more fine-graded organization of domain-specific knowledge allows them to fixate more rapidly and frequently in the classroom. Second, we found that the selected velocity threshold parameter alter and, in the worst case, bias the results of an eye-tracking study. Therefore, in the interest of further generalizability of the results within visual expertise research, we emphasize that it is highly important to report configurations that are relevant for the identification of eye movements.
PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices
Authors: Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05152
Pdf link: https://arxiv.org/pdf/2304.05152
Abstract The success of transformers in computer vision has led to several attempts to adapt them for mobile devices, but their performance remains unsatisfactory in some real-world applications. To address this issue, we propose PP-MobileSeg, a semantic segmentation model that achieves state-of-the-art performance on mobile devices. PP-MobileSeg comprises three novel parts: the StrideFormer backbone, the Aggregated Attention Module (AAM), and the Valid Interpolate Module (VIM). The four-stage StrideFormer backbone is built with MV3 blocks and strided SEA attention, and it is able to extract rich semantic and detailed features with minimal parameter overhead. The AAM first filters the detailed features through semantic feature ensemble voting and then combines them with semantic features to enhance the semantic information. Furthermore, we proposed VIM to upsample the downsampled feature to the resolution of the input image. It significantly reduces model latency by only interpolating classes present in the final prediction, which is the most significant contributor to overall model latency. Extensive experiments show that PP-MobileSeg achieves a superior tradeoff between accuracy, model size, and latency compared to other methods. On the ADE20K dataset, PP-MobileSeg achieves 1.57% higher accuracy in mIoU than SeaFormer-Base with 32.9% fewer parameters and 42.3% faster acceleration on Qualcomm Snapdragon 855. Source codes are available at https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.8.
A user co-designed digital INtervention for Child LangUage DisordEr: The INCLUDE Project Protocol
Authors: Rafiah Patel
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2304.05224
Pdf link: https://arxiv.org/pdf/2304.05224
Abstract Around ten percent of all children could have a disorder where language does not develop as expected. This often effects vocabulary skills, i.e., finding the words to express wants, needs and ideas, which can influence behaviours linked to wellbeing and daily functioning, such as concentration, independence, social interactions and managing emotions. Without specialist support, needs can increase in severity and continue to adulthood. The type of support, known as interventions showing strongest evidence for improving vocabulary with some signs of improved behaviour and wellbeing are ones that use word-webs. These are diagrams consisting of lines that connect sound and meaning information about a word to strengthen the child's word knowledge and use. The diagrams resemble what is commonly known as mind-maps and are widely used by Speech and Language Therapists in partnership with schools to help children with language difficulties. In addition, interventions delivered through mobile-devices has led in some cases to increased vocabulary gains with positive influence on wellbeing and academic attainment. With advances in technology and the availability of user-friendly mobile devices to capture, combine and replay multimedia content, new opportunities for designing bespoke vocabulary instruction have emerged that are without timing and location constraints. This brings the potential to engage and motivate users and harbour independence through functional strategies that support each child's unique language needs. To achieve this, children with language disorder, their parents/carers, support professionals and software development team members must work jointly to create an intervention that is fit for purpose. This is the first research planned to explore the collaborative development and acceptability of a digitally enhanced vocabulary intervention for child language disorder.
Keyword: pruning

FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*
Authors: Konstantin Emil Thiel, Daniel Kocher, Nikolaus Augsten, Thomas Hütter, Willi Mann, Daniel Ulrich Schmitt
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2304.04817
Pdf link: https://arxiv.org/pdf/2304.04817
Abstract Density-based clustering aims to find groups of similar objects (i.e., clusters) in a given dataset. Applications include, e.g., process mining and anomaly detection. It comes with two user parameters ({\epsilon}, MinPts) that determine the clustering result, but are typically unknown in advance. Thus, users need to interactively test various settings until satisfying clusterings are found. However, existing solutions suffer from the following limitations: (a) Ineffective pruning of expensive neighborhood computations. (b) Approximate clustering, where objects are falsely labeled noise. (c) Restricted parameter tuning that is limited to {\epsilon} whereas MinPts is constant, which reduces the explorable clusterings. (d) Inflexibility in terms of applicable data types and distance functions. We propose FINEX, a linear-space index that overcomes these limitations. Our index provides exact clusterings and can be queried with either of the two parameters. FINEX avoids neighborhood computations where possible and reduces the complexities of the remaining computations by leveraging fundamental properties of density-based clusters. Hence, our solution is effcient and flexible regarding data types and distance functions. Moreover, FINEX respects the original and straightforward notion of density-based clustering. In our experiments on 12 large real-world datasets from various domains, FINEX frequently outperforms state-of-the-art techniques for exact clustering by orders of magnitude.
Design, Integration, and Field Evaluation of a Robotic Blossom Thinning System for Tree Fruit Crops
Authors: Uddhav Bhattarai, Qin Zhang, Manoj Karkee
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.04919
Pdf link: https://arxiv.org/pdf/2304.04919
Abstract The US apple industry relies heavily on semi-skilled manual labor force for essential field operations such as training, pruning, blossom and green fruit thinning, and harvesting. Blossom thinning is one of the crucial crop load management practices to achieve desired crop load, fruit quality, and return bloom. While several techniques such as chemical, and mechanical thinning are available for large-scale blossom thinning such approaches often yield unpredictable thinning results and may cause damage the canopy, spurs, and leaf tissue. Hence, growers still depend on laborious, labor intensive and expensive manual hand blossom thinning for desired thinning outcomes. This research presents a robotic solution for blossom thinning in apple orchards using a computer vision system with artificial intelligence, a six degrees of freedom robotic manipulator, and an electrically actuated miniature end-effector for robotic blossom thinning. The integrated robotic system was evaluated in a commercial apple orchard which showed promising results for targeted and selective blossom thinning. Two thinning approaches, center and boundary thinning, were investigated to evaluate the system ability to remove varying proportion of flowers from apple flower clusters. During boundary thinning the end effector was actuated around the cluster boundary while center thinning involved end-effector actuation only at the cluster centroid for a fixed duration of 2 seconds. The boundary thinning approach thinned 67.2% of flowers from the targeted clusters with a cycle time of 9.0 seconds per cluster, whereas center thinning approach thinned 59.4% of flowers with a cycle time of 7.2 seconds per cluster. When commercially adopted, the proposed system could help address problems faced by apple growers with current hand, chemical, and mechanical blossom thinning approaches.
Model sparsification can simplify machine unlearning
Authors: Jinghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, Sijia Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04934
Pdf link: https://arxiv.org/pdf/2304.04934
Abstract Recent data regulations necessitate machine unlearning (MU): The removal of the effect of specific examples from the model. While exact unlearning is possible by conducting a model retraining with the remaining data from scratch, its computational cost has led to the development of approximate but efficient unlearning schemes. Beyond data-centric MU solutions, we advance MU through a novel model-based viewpoint: sparsification via weight pruning. Our results in both theory and practice indicate that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. With this insight, we develop two new sparsity-aware unlearning meta-schemes, termed prune first, then unlearn' andsparsity-aware unlearning'. Extensive experiments show that our findings and proposals consistently benefit MU in various scenarios, including class-wise data scrubbing, random data scrubbing, and backdoor data forgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest approximate unlearning methods) in the proposed sparsity-aware unlearning paradigm. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
Keyword: voxel

Weakly Supervised Intracranial Hemorrhage Segmentation using Head-Wise Gradient-Infused Self-Attention Maps from a Swin Transformer in Categorical Learning
Authors: Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04902
Pdf link: https://arxiv.org/pdf/2304.04902
Abstract Intracranial hemorrhage (ICH) is a life-threatening medical emergency caused by various factors. Timely and precise diagnosis of ICH is crucial for administering effective treatment and improving patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we proposed a novel weakly supervised ICH segmentation method that leverages a hierarchical combination of head-wise gradient-infused self-attention maps obtained from a Swin transformer. The transformer is trained using an ICH classification task with categorical labels. To build and validate the proposed technique, we used two publicly available clinical CT datasets, namely RSNA 2019 Brain CT hemorrhage and PhysioNet. Additionally, we conducted an exploratory study comparing two learning strategies - binary classification and full ICH subtyping - to assess their impact on self-attention and our weakly supervised ICH segmentation framework. The proposed algorithm was compared against the popular U-Net with full supervision, as well as a similar weakly supervised approach using Grad-CAM for ICH segmentation. With a mean Dice score of 0.47, our technique achieved similar ICH segmentation performance as the U-Net and outperformed the Grad-CAM based approach, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks.
EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls
Authors: Ziyun Wang, Kenneth Chaney, Kostas Daniilidis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05296
Pdf link: https://arxiv.org/pdf/2304.05296
Abstract 3D reconstruction from multiple views is a successful computer vision field with multiple deployments in applications. State of the art is based on traditional RGB frames that enable optimization of photo-consistency cross views. In this paper, we study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency as well as by the biological evidence that eyes in nature capture the same data and still perceive well 3D shape. The foundation of our hypothesis that 3D reconstruction is feasible using events lies in the information contained in the occluding contours and in the continuous scene acquisition with events. We propose Apparent Contour Events (ACE), a novel event-based representation that defines the geometry of the apparent contour of an object. We represent ACE by a spatially and temporally continuous implicit function defined in the event x-y-t space. Furthermore, we design a novel continuous Voxel Carving algorithm enabled by the high temporal resolution of the Apparent Contour Events. To evaluate the performance of the method, we collect MOEC-3D, a 3D event dataset of a set of common real-world objects. We demonstrate the ability of EvAC3D to reconstruct high-fidelity mesh surfaces from real event sequences while allowing the refinement of the 3D reconstruction for each individual event.
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05316
Pdf link: https://arxiv.org/pdf/2304.05316
Abstract The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.
Keyword: lidar

Simultaneous localization and mapping by using Low-Cost Ultrasonic Sensor for Underwater crawler
Authors: Trish Velan Dcruz, Cicero Estibeiro, Anil Shankar, Mangal Das
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05155
Pdf link: https://arxiv.org/pdf/2304.05155
Abstract Autonomous robots can help people explore parts of the ocean that would be hard or impossible to get to otherwise. The increase in the availability of low-cost components has made it possible to innovate, design, and implement new and innovative ideas for underwater robotics. Cost-effective and open solutions that are available today can be used to replace expensive robot systems. The prototype of an autonomous robot system that functions in brackish waterways in settings such as fish hatcheries is presented in this research. The system has low-cost ultrasonic sensors that use a SLAM algorithm to map and move through the environment. When compared to previous studies that used Lidar sensors, this system's configuration was chosen to keep costs down. A comparison is shown between ultrasonic and lidar sensors, showing their respective pros and cons.
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05316
Pdf link: https://arxiv.org/pdf/2304.05316
Abstract The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.
Keyword: diffusion

$\textit{e-Uber}$: A Crowdsourcing Platform for Electric Vehicle-based Ride- and Energy-sharing
Authors: Ashutosh Timilsina, Simone Silvestri
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04753
Pdf link: https://arxiv.org/pdf/2304.04753
Abstract The sharing-economy-based business model has recently seen success in the transportation and accommodation sectors with companies like Uber and Airbnb. There is growing interest in applying this model to energy systems, with modalities like peer-to-peer (P2P) Energy Trading, Electric Vehicles (EV)-based Vehicle-to-Grid (V2G), Vehicle-to-Home (V2H), Vehicle-to-Vehicle (V2V), and Battery Swapping Technology (BST). In this work, we exploit the increasing diffusion of EVs to realize a crowdsourcing platform called e-Uber that jointly enables ride-sharing and energy-sharing through V2G and BST. e-Uber exploits spatial crowdsourcing, reinforcement learning, and reverse auction theory. Specifically, the platform uses reinforcement learning to understand the drivers' preferences towards different ride-sharing and energy-sharing tasks. Based on these preferences, a personalized list is recommended to each driver through CMAB-based Algorithm for task Recommendation System (CARS). Drivers bid on their preferred tasks in their list in a reverse auction fashion. Then e-Uber solves the task assignment optimization problem that minimizes cost and guarantees V2G energy requirement. We prove that this problem is NP-hard and introduce a bipartite matching-inspired heuristic, Bipartite Matching-based Winner selection (BMW), that has polynomial time complexity. Results from experiments using real data from NYC taxi trips and energy consumption show that e-Uber performs close to the optimum and finds better solutions compared to a state-of-the-art approach
DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion
Authors: ZiHan Cao, ShiQi Cao, Xiao Wu, JunMing Hou, Ran Ran, Liang-Jian Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.04774
Pdf link: https://arxiv.org/pdf/2304.04774
Abstract Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation and designing two different conditional injection modulation modules (i.e., style transfer modulation and wavelet modulation) to inject coarse-grained style information and fine-grained high-frequency and low-frequency information into the diffusion UNet, thereby generating fused images. In addition, we also discussed the residual learning and the selection of training objectives of the diffusion model in the image fusion task. Extensive experimental results based on quantitative and qualitative assessments compared with benchmarks demonstrates state-of-the-art results and good generalization performance in image fusion tasks. Finally, it is hoped that our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks. Code shall be released for better reproducibility.
Binary Latent Diffusion
Authors: Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04820
Pdf link: https://arxiv.org/pdf/2304.04820
Abstract In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. We model the bi-directional mappings between an image and the corresponding latent binary representation by training an auto-encoder with a Bernoulli encoding distribution. On the one hand, the binary latent space provides a compact discrete image representation of which the distribution can be modeled more efficiently than pixels or continuous latent representations. On the other hand, we now represent each image patch as a binary vector instead of an index of a learned cookbook as in discrete image representations with vector quantization. In this way, we obtain binary latent representations that allow for better image quality and high-resolution image representations without any multi-stage hierarchy in the latent space. In this binary latent space, images can now be generated effectively using a binary latent diffusion model tailored specifically for modeling the prior over the binary image representations. We present both conditional and unconditional image generation experiments with multiple datasets, and show that the proposed method performs comparably to state-of-the-art methods while dramatically improving the sampling efficiency to as few as 16 steps without using any test-time acceleration. The proposed framework can also be seamlessly scaled to $1024 \times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
iPINNs: Incremental learning for Physics-informed neural networks
Authors: Aleksandr Dekhovich, Marcel H.F. Sluiter, David M.J. Tax, Miguel A. Bessa
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.04854
Pdf link: https://arxiv.org/pdf/2304.04854
Abstract Physics-informed neural networks (PINNs) have recently become a powerful tool for solving partial differential equations (PDEs). However, finding a set of neural network parameters that lead to fulfilling a PDE can be challenging and non-unique due to the complexity of the loss landscape that needs to be traversed. Although a variety of multi-task learning and transfer learning approaches have been proposed to overcome these issues, there is no incremental training procedure for PINNs that can effectively mitigate such training challenges. We propose incremental PINNs (iPINNs) that can learn multiple tasks (equations) sequentially without additional parameters for new tasks and improve performance for every equation in the sequence. Our approach learns multiple PDEs starting from the simplest one by creating its own subnetwork for each PDE and allowing each subnetwork to overlap with previously learned subnetworks. We demonstrate that previous subnetworks are a good initialization for a new equation if PDEs share similarities. We also show that iPINNs achieve lower prediction error than regular PINNs for two different scenarios: (1) learning a family of equations (e.g., 1-D convection PDE); and (2) learning PDEs resulting from a combination of processes (e.g., 1-D reaction-diffusion PDE). The ability to learn all problems with a single network together with learning more complex PDEs with better generalization than regular PINNs will open new avenues in this field.
Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond
Authors: Mohammadreza Armandpour, Huangjie Zheng, Ali Sadeghian, Amir Sadeghian, Mingyuan Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04968
Pdf link: https://arxiv.org/pdf/2304.04968
Abstract Although text-to-image diffusion models have made significant strides in generating images from text, they are sometimes more inclined to generate images like the data on which the model was trained rather than the provided text. This limitation has hindered their usage in both 2D and 3D applications. To address this problem, we explored the use of negative prompts but found that the current implementation fails to produce desired results, particularly when there is an overlap between the main and negative prompts. To overcome this issue, we propose Perp-Neg, a new algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm. Perp-Neg does not require any training or fine-tuning of the model. Moreover, we experimentally demonstrate that Perp-Neg provides greater flexibility in generating images by enabling users to edit out unwanted concepts from the initially generated images in 2D cases. Furthermore, to extend the application of Perp-Neg to 3D, we conducted a thorough exploration of how Perp-Neg can be used in 2D to condition the diffusion model to generate desired views, rather than being biased toward the canonical views. Finally, we applied our 2D intuition to integrate Perp-Neg with the state-of-the-art text-to-3D (DreamFusion) method, effectively addressing its Janus (multi-head) problem.
Diffusion Recommender Model
Authors: Wenjie Wang, Yiyan Xu, Fuli Feng, Xinyu Lin, Xiangnan He, Tat-Seng Chua
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2304.04971
Pdf link: https://arxiv.org/pdf/2304.04971
Abstract Generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are widely utilized to model the generative process of user interactions. However, these generative models suffer from intrinsic limitations such as the instability of GANs and the restricted representation ability of VAEs. Such limitations hinder the accurate modeling of the complex user interaction generation procedure, such as noisy interactions caused by various interference factors. In light of the impressive advantages of Diffusion Models (DMs) over traditional generative models in image synthesis, we propose a novel Diffusion Recommender Model (named DiffRec) to learn the generative process in a denoising manner. To retain personalized information in user interactions, DiffRec reduces the added noises and avoids corrupting users' interactions into pure noises like in image synthesis. In addition, we extend traditional DMs to tackle the unique challenges in practical recommender systems: high resource costs for large-scale item prediction and temporal shifts of user preference. To this end, we propose two extensions of DiffRec: L-DiffRec clusters items for dimension compression and conducts the diffusion processes in the latent space; and T-DiffRec reweights user interactions based on the interaction timestamps to encode temporal information. We conduct extensive experiments on three datasets under multiple settings (e.g. clean training, noisy training, and temporal training). The empirical results and in-depth analysis validate the superiority of DiffRec with two extensions over competitive baselines.
SPIRiT-Diffusion: Self-Consistency Driven Diffusion Model for Accelerated MRI
Authors: Zhuo-Xu Cui, Chentao Cao, Jing Cheng, Sen Jia, Hairong Zheng, Dong Liang, Yanjie Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05060
Pdf link: https://arxiv.org/pdf/2304.05060
Abstract Diffusion models are a leading method for image generation and have been successfully applied in magnetic resonance imaging (MRI) reconstruction. Current diffusion-based reconstruction methods rely on coil sensitivity maps (CSM) to reconstruct multi-coil data. However, it is difficult to accurately estimate CSMs in practice use, resulting in degradation of the reconstruction quality. To address this issue, we propose a self-consistency-driven diffusion model inspired by the iterative self-consistent parallel imaging (SPIRiT), namely SPIRiT-Diffusion. Specifically, the iterative solver of the self-consistent term in SPIRiT is utilized to design a novel stochastic differential equation (SDE) for diffusion process. Then $\textit{k}$-space data can be interpolated directly during the reverse diffusion process, instead of using CSM to separate and combine individual coil images. This method indicates that the optimization model can be used to design SDE in diffusion models, driving the diffusion process strongly conforming with the physics involved in the optimization model, dubbed model-driven diffusion. The proposed SPIRiT-Diffusion method was evaluated on a 3D joint Intracranial and Carotid Vessel Wall imaging dataset. The results demonstrate that it outperforms the CSM-based reconstruction methods, and achieves high reconstruction quality at a high acceleration rate of 10.
Gradient flows of interacting Laguerre cells as discrete porous media flows
Authors: Andrea Natale (RAPSODI )
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.05069
Pdf link: https://arxiv.org/pdf/2304.05069
Abstract We study a class of discrete models in which a collection of particles evolves in time following the gradient flow of an energy depending on the cell areas of an associated Laguerre (i.e. a weighted Voronoi) tessellation. We consider the high number of cell limit of such systems and, using a modulated energy argument, we prove convergence towards smooth solutions of nonlinear diffusion PDEs of porous medium type.
Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing
Authors: Wei Lu, Nic A. Lee, Markus J. Buehler
Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
Arxiv link: https://arxiv.org/abs/2304.05137
Pdf link: https://arxiv.org/pdf/2304.05137
Abstract Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.
Multi-scale Fusion Fault Diagnosis Method Based on Two-Dimensionaliztion Sequence in Complex Scenarios
Authors: Weiyang Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05198
Pdf link: https://arxiv.org/pdf/2304.05198
Abstract Rolling bearings are critical components in rotating machinery, and their faults can cause severe damage. Early detection of abnormalities is crucial to prevent catastrophic accidents. Traditional and intelligent methods have been used to analyze time series data, but in real-life scenarios, sensor data is often noisy and cannot be accurately characterized in the time domain, leading to mode collapse in trained models. Two-dimensionalization methods such as the Gram angle field method (GAF) or interval sampling have been proposed, but they lack mathematical derivation and interpretability. This paper proposes an improved GAF combined with grayscale images for convolution scenarios. The main contributions include illustrating the feasibility of the approach in complex scenarios, widening the data set, and introducing an improved convolutional neural network method with a multi-scale feature fusion diffusion model and deep learning compression techniques for deployment in industrial scenarios.
Diffusion Models for Constrained Domains
Authors: Nic Fishman, Leo Klarner, Valentin De Bortoli, Emile Mathieu, Michael Hutchinson
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.05364
Pdf link: https://arxiv.org/pdf/2304.05364
Abstract Denoising diffusion models are a recent class of generative models which achieve state-of-the-art results in many domains such as unconditional image generation and text-to-speech tasks. They consist of a noising process destroying the data and a backward stage defined as the time-reversal of the noising diffusion. Building on their success, diffusion models have recently been extended to the Riemannian manifold setting. Yet, these Riemannian diffusion models require geodesics to be defined for all times. While this setting encompasses many important applications, it does not include manifolds defined via a set of inequality constraints, which are ubiquitous in many scientific domains such as robotics and protein design. In this work, we introduce two methods to bridge this gap. First, we design a noising process based on the logarithmic barrier metric induced by the inequality constraints. Second, we introduce a noising process based on the reflected Brownian motion. As existing diffusion model techniques cannot be applied in this setting, we derive new tools to define such models in our framework. We empirically demonstrate the applicability of our methods to a number of synthetic and real-world tasks, including the constrained conformational modelling of protein backbones and robotic arms.
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
Authors: Eslam Mohamed Bakr, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05390
Pdf link: https://arxiv.org/pdf/2304.05390
Abstract In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human evaluation, limiting their ability to holistically assess the model's capabilities. Furthermore, there is a significant gap between efforts in developing new T2I architectures and those in evaluation. To address this, we introduce HRS-Bench, a concrete evaluation benchmark for T2I models that is Holistic, Reliable, and Scalable. Unlike existing bench-marks that focus on limited aspects, HRS-Bench measures 13 skills that can be categorized into five major categories: accuracy, robustness, generalization, fairness, and bias. In addition, HRS-Bench covers 50 scenarios, including fashion, animals, transportation, food, and clothes. We evaluate nine recent large-scale T2I models using metrics that cover a wide range of skills. A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench. Our experiments demonstrate that existing models often struggle to generate images with the desired count of objects, visual text, or grounded emotions. We hope that our benchmark help ease future text-to-image generation research. The code and data are available at https://eslambakr.github.io/hrsbench.github.io
Keyword: dynamic

Porównanie metod detekcji zajętości widma radiowego z wykorzystaniem uczenia federacyjnego z oraz bez węzła centralnego
Authors: Łukasz Kułacz
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.04754
Pdf link: https://arxiv.org/pdf/2304.04754
Abstract Dynamic spectrum access systems typically require information about the spectrum occupancy and thus the presence of other users in order to make a spectrum al-location decision for a new device. Simple methods of spectrum occupancy detection are often far from reliable, hence spectrum occupancy detection algorithms supported by machine learning or artificial intelligence are often and successfully used. To protect the privacy of user data and to reduce the amount of control data, an interesting approach is to use federated machine learning. This paper compares two approaches to system design using federated machine learning: with and without a central node.
Distributed Estimation with Decentralized Control for Quadruple-Tank Process
Authors: Moh Kamalul Wafi, Bambang L. Widjiantoro
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04763
Pdf link: https://arxiv.org/pdf/2304.04763
Abstract This paper proposes the design of quadruple-tank process due to the unique multivariable MIMO system under minimum and non-minimum scenario with respect to the valve ratio. This model is then implemented the distributed estimation algorithm with decentralized control. The inputs are set in divergent gains of pumps while the four-tank process is interconnected so that the stability properties are different, making the usage of decentralized control is reasonable. The number of outputs is designed the same as those of inputs which are also that of distributed Luenberger observer with the continuous linearized dynamical system. This distributed comprises local estimates only in certain output, meaning that it would lead to insufficiency so that the neighbouring links under some network topologies are required in the dynamical system. This concept fortunately works in two different characteristic stability of the tank process regarding estimating the states. This success leads to the further research of the more large-scale complex system.
Non-Linear Estimation using the Weighted Average Consensus-Based Unscented Filtering for Various Vehicles Dynamics towards Autonomous Sensorless Design
Authors: Bambang L. Widjiantoro, Moh Kamalul Wafi, Katherin Indriawati
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04766
Pdf link: https://arxiv.org/pdf/2304.04766
Abstract The concerns to autonomous vehicles have been becoming more intriguing in coping with the more environmentally dynamics non-linear systems under some constraints and disturbances. These vehicles connect not only to the self-instruments yet to the neighborhoods components, making the diverse interconnected communications which should be handled locally to ease the computation and to fasten the decision. To deal with those interconnected networks, the distributed estimation to reach the untouched states, pursuing sensorless design, is approached, initiated by the construction of the modified pseudo measurement which, due to approximation, led to the weighted average consensus calculation within unscented filtering along with the bounded estimation errors. Moreover, the tested vehicles are also associated to certain robust control scenarios subject to noise and disturbance with some stability analysis to ensure the usage of the proposed estimation algorithm. The numerical instances are presented along with the performances of the control and estimation method. The results affirms the effectiveness of the method with limited error deviation compared to the other centralized and distributed filtering. Beyond these, the further research would be the directed sensorless design and fault-tolerant learning control subject to faults to negate the failures.
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Authors: Drew Penney, Bin Li, Lizhong Chen, Jaroslaw J. Sydir, Anna Drewek-Ossowicka, Ramesh Illikkal, Charlie Tai, Ravi Iyer, Andrew Herdrich
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04797
Pdf link: https://arxiv.org/pdf/2304.04797
Abstract Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.
Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People
Authors: Kavya Puthuveetil, Sasha Wald, Atharva Pusalkar, Pratyusha Karnati, Zackory Erickson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.04822
Pdf link: https://arxiv.org/pdf/2304.04822
Abstract Robotic caregivers could potentially improve the quality of life of many who require physical assistance. However, in order to assist individuals who are lying in bed, robots must be capable of dealing with a significant obstacle: the blanket or sheet that will almost always cover the person's body. We propose a method for targeted bedding manipulation over people lying supine in bed where we first learn a model of the cloth's dynamics. Then, we optimize over this model to uncover a given target limb using information about human body shape and pose that only needs to be provided at run-time. We show how this approach enables greater robustness to variation relative to geometric and reinforcement learning baselines via a number of generalization evaluations in simulation and in the real world. We further evaluate our approach in a human study with 12 participants where we demonstrate that a mobile manipulator can adapt to real variation in human body shape, size, pose, and blanket configuration to uncover target body parts without exposing the rest of the body. Source code and supplementary materials are available online.
Exact Set-valued Estimation using Constrained Convex Generators for uncertain Linear Systems
Authors: Daniel Silvestre
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04826
Pdf link: https://arxiv.org/pdf/2304.04826
Abstract Set-valued state estimation when in the presence of uncertainties in the model have been addressed in the literature essentially following three main approaches: i) interval arithmetic of the uncertain dynamics with the estimates; ii) factorizing the uncertainty into matrices with unity rank; and, iii) performing the convex hull for the vertices of the uncertainty space. Approach i) and ii) introduce a lot of conservatism because both disregard the relationship of the parameters with the entries of the dynamics matrix. On the other hand, approach iii) has a large growth on the number of variables required to represent the set or is approximated losing its main advantage in comparison with i) and ii). In this paper, with the application of autonomous vehicles in GPS-denied areas that resort to beacon signals for localization, we develop an exact (meaning no added conservatism) and optimal (smallest growth in the number of variables) closed-form definition for the convex hull of Convex Constrained Generators (CCGs). This results in a more efficient method to represent the minimum volume convex set corresponding to the state estimation. Given that reductions methods are still lacking in the literature for CCGs, we employ an approximation using ray-shooting that is comparable in terms of accuracy with methods for Constrained Zonotopes as the ones implemented in CORA. Simulations illustrate the greater accuracy of CCGs with the proposed convex hull operation in comparison to Constrained Zonotopes.
A few-shot graph Laplacian-based approach for improving the accuracy of low-fidelity data
Authors: Orazio Pinti, Assad A. Oberai
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.04862
Pdf link: https://arxiv.org/pdf/2304.04862
Abstract Low-fidelity data is typically inexpensive to generate but inaccurate. On the other hand, high-fidelity data is accurate but expensive to obtain. Multi-fidelity methods use a small set of high-fidelity data to enhance the accuracy of a large set of low-fidelity data. In the approach described in this paper, this is accomplished by constructing a graph Laplacian using the low-fidelity data and computing its low-lying spectrum. This spectrum is then used to cluster the data and identify points that are closest to the centroids of the clusters. High-fidelity data is then acquired for these key points. Thereafter, a transformation that maps every low-fidelity data point to its bi-fidelity counterpart is determined by minimizing the discrepancy between the bi- and high-fidelity data at the key points, and to preserve the underlying structure of the low-fidelity data distribution. The latter objective is achieved by relying, once again, on the spectral properties of the graph Laplacian. This method is applied to a problem in solid mechanics and another in aerodynamics. In both cases, this methods uses a small fraction of high-fidelity data to significantly improve the accuracy of a large set of low-fidelity data.
Neural Network Predicts Ion Concentration Profiles under Nanoconfinement
Authors: Zhonglin Cao, Yuyang Wang, Cooper Lorsung, Amir Barati Farimani
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Arxiv link: https://arxiv.org/abs/2304.04896
Pdf link: https://arxiv.org/pdf/2304.04896
Abstract Modeling the ion concentration profile in nanochannel plays an important role in understanding the electrical double layer and electroosmotic flow. Due to the non-negligible surface interaction and the effect of discrete solvent molecules, molecular dynamics (MD) simulation is often used as an essential tool to study the behavior of ions under nanoconfinement. Despite the accuracy of MD simulation in modeling nanoconfinement systems, it is computationally expensive. In this work, we propose neural network to predict ion concentration profiles in nanochannels with different configurations, including channel widths, ion molarity, and ion types. By modeling the ion concentration profile as a probability distribution, our neural network can serve as a much faster surrogate model for MD simulation with high accuracy. We further demonstrate the superior prediction accuracy of neural network over XGBoost. Lastly, we demonstrated that neural network is flexible in predicting ion concentration profiles with different bin sizes. Overall, our deep learning model is a fast, flexible, and accurate surrogate model to predict ion concentration profiles in nanoconfinement.
AffectMachine-Classical: A novel system for generating affective classical music
Authors: Kat R. Agres, Adyasha Dash, Phoebe Chua
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.04915
Pdf link: https://arxiv.org/pdf/2304.04915
Abstract This work introduces a new music generation system, called AffectMachine-Classical, that is capable of generating affective Classic music in real-time. AffectMachine was designed to be incorporated into biofeedback systems (such as brain-computer-interfaces) to help users become aware of, and ultimately mediate, their own dynamic affective states. That is, this system was developed for music-based MedTech to support real-time emotion self-regulation in users. We provide an overview of the rule-based, probabilistic system architecture, describing the main aspects of the system and how they are novel. We then present the results of a listener study that was conducted to validate the ability of the system to reliably convey target emotions to listeners. The findings indicate that AffectMachine-Classical is very effective in communicating various levels of Arousal ($R^2 = .96$) to listeners, and is also quite convincing in terms of Valence (R^2 = .90). Future work will embed AffectMachine-Classical into biofeedback systems, to leverage the efficacy of the affective music for emotional well-being in listeners.
A Data-Driven State Aggregation Approach for Dynamic Discrete Choice Models
Authors: Sinong Geng, Houssam Nassif, Carlos A. Manzanares
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2304.04916
Pdf link: https://arxiv.org/pdf/2304.04916
Abstract We study dynamic discrete choice models, where a commonly studied problem involves estimating parameters of agent reward functions (also known as "structural" parameters), using agent behavioral data. Maximum likelihood estimation for such models requires dynamic programming, which is limited by the curse of dimensionality. In this work, we present a novel algorithm that provides a data-driven method for selecting and aggregating states, which lowers the computational and sample complexity of estimation. Our method works in two stages. In the first stage, we use a flexible inverse reinforcement learning approach to estimate agent Q-functions. We use these estimated Q-functions, along with a clustering algorithm, to select a subset of states that are the most pivotal for driving changes in Q-functions. In the second stage, with these selected "aggregated" states, we conduct maximum likelihood estimation using a commonly used nested fixed-point algorithm. The proposed two-stage approach mitigates the curse of dimensionality by reducing the problem dimension. Theoretically, we derive finite-sample bounds on the associated estimation error, which also characterize the trade-off of computational complexity, estimation error, and sample complexity. We demonstrate the empirical performance of the algorithm in two classic dynamic discrete choice estimation applications.
Point-and-Shoot All-in-Focus Photo Synthesis from Smartphone Camera Pair
Authors: Xianrui Luo, Juewen Peng, Weiyue Zhao, Ke Xian, Hao Lu, Zhiguo Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04917
Pdf link: https://arxiv.org/pdf/2304.04917
Abstract All-in-Focus (AIF) photography is expected to be a commercial selling point for modern smartphones. Standard AIF synthesis requires manual, time-consuming operations such as focal stack compositing, which is unfriendly to ordinary people. To achieve point-and-shoot AIF photography with a smartphone, we expect that an AIF photo can be generated from one shot of the scene, instead of from multiple photos captured by the same camera. Benefiting from the multi-camera module in modern smartphones, we introduce a new task of AIF synthesis from main (wide) and ultra-wide cameras. The goal is to recover sharp details from defocused regions in the main-camera photo with the help of the ultra-wide-camera one. The camera setting poses new challenges such as parallax-induced occlusions and inconsistent color between cameras. To overcome the challenges, we introduce a predict-and-refine network to mitigate occlusions and propose dynamic frequency-domain alignment for color correction. To enable effective training and evaluation, we also build an AIF dataset with 2686 unique scenes. Each scene includes two photos captured by the main camera, one photo captured by the ultrawide camera, and a synthesized AIF photo. Results show that our solution, termed EasyAIF, can produce high-quality AIF photos and outperforms strong baselines quantitatively and qualitatively. For the first time, we demonstrate point-and-shoot AIF photo synthesis successfully from main and ultra-wide cameras.
Staged Contact Optimization: Combining Contact-Implicit and Multi-Phase Hybrid Trajectory Optimization
Authors: Michael R. Turski, Joseph Norby, Aaron M. Johnson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.04923
Pdf link: https://arxiv.org/pdf/2304.04923
Abstract Trajectory optimization problems for legged robots are commonly formulated with fixed contact schedules. These multi-phase Hybrid Trajectory Optimization (HTO) methods result in locally optimal trajectories, but the result depends heavily upon the predefined contact mode sequence. Contact-Implicit Optimization (CIO) offers a potential solution to this issue by allowing the contact mode to be determined throughout the trajectory by the optimization solver. However, CIO suffers from long solve times and convergence issues. This work combines the benefits of these two methods into one algorithm: Staged Contact Optimization (SCO). SCO tightens constraints on contact in stages, eventually fixing them to allow robust and fast convergence to a feasible solution. Results on a planar biped and spatial quadruped demonstrate speed and optimality improvements over CIO and HTO. These properties make SCO well suited for offline trajectory generation or as an effective tool for exploring the dynamic capabilities of a robot.
Universal dual-port grid-forming control: bridging the gap between grid-forming and grid-following control
Authors: Irina Subotić, and Dominic Groß
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.04939
Pdf link: https://arxiv.org/pdf/2304.04939
Abstract We study a dual-port grid-forming (GFM) control for power systems containing ac and dc transmission, converter-interfaced generation and energy storage, and legacy generation. To operate such a system and provide standard services, state-of-the-art control architectures i) require assigning grid-following (GFL) and GFM controls to different converters, and ii) result in highly complex system dynamics. In contrast, dual-port GFM control (i) subsumes standard functions of GFM and GFL controls in a simple controller, ii) can be applied to a wide range of emerging technologies independently of the network configuration, and iii) significantly reduces system complexity. In this work, we provide i) an end-to-end modeling framework that allows to model complex topologies through composition of reduced-order device models, ii) an in-depth discussion of universal dual-port GFM control for emerging power systems, and iii) end-to-end stability conditions that cover a wide range of network topologies, emerging technologies, and legacy technologies. Finally, we validate our findings in a detailed case study.
A Family of Iteration Functions for General Linear Systems
Authors: Bahman Kalantari
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2304.04940
Pdf link: https://arxiv.org/pdf/2304.04940
Abstract We develop novel theory and algorithms for computing approximate solution to $Ax=b$, or to $A^TAx=A^Tb$, where $A$ is an $m \times n$ real matrix of arbitrary rank. First, we describe the {\it Triangle Algorithm} (TA), where given an ellipsoid $E_{A,\rho}={Ax: \Vert x \Vert \leq \rho}$, in each iteration it either computes successively improving approximation $b_k=Axk \in E{A,\rho}$, or proves $b \not \in E_{A, \rho}$. We then extend TA for computing an approximate solution or minimum-norm solution. Next, we develop a dynamic version of TA, the {\it Centering Triangle Algorithm} (CTA), generating residuals $r_k=b - Ax_k$ via iterations of the simple formula, $F_1(r)=r-(r^THr/r^TH^2r)Hr$, where $H=A$ when $A$ is symmetric PSD, otherwise $H=AA^T$ but need not be computed explicitly. More generally, CTA extends to a family of iteration function, $F_t( r)$, $t=1, \dots, m$ satisfying: On the one hand, given $t \leq m$ and $r_0=b-Ax_0$, where $x_0=A^Tw_0$ with $w_0 \in \mathbb{R}^m$ arbitrary, for all $k \geq 1$, $r_k=Ft(r{k-1})=b-Ax_k$ and $A^Tr_k$ converges to zero. Algorithmically, if $H$ is invertible with condition number $\kappa$, in $k=O( (\kappa/t) \ln \varepsilon^{-1})$ iterations $\Vert r_k \Vert \leq \varepsilon$. If $H$ is singular with $\kappa^+$ the ratio of its largest to smallest positive eigenvalues, in $k =O(\kappa^+/t\varepsilon)$ iterations either $\Vert r_k \Vert \leq \varepsilon$ or $\Vert A^T r_k\Vert= O(\sqrt{\varepsilon})$. If $N$ is the number of nonzero entries of $A$, each iteration take $O(Nt+t^3)$ operations. On the other hand, given $r_0=b-Ax0$, suppose its minimal polynomial with respect to $H$ has degree $s$. Then $Ax=b$ is solvable if and only if $F{s}(r0)=0$. Moreover, exclusively $A^TAx=A^Tb$ is solvable, if and only if $F{s}(r_0) \not= 0$ but $A^T F_s(r_0)=0$. Additionally, ${F_t(r0)}{t=1}^s$ is computable in $O(Ns+s^3)$ operations.
StageInteractor: Query-based Object Detector with Cross-stage Interaction
Authors: Yao Teng, Haisong Liu, Sheng Guo, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.04978
Pdf link: https://arxiv.org/pdf/2304.04978
Abstract Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.
Detecting Anomalous Microflows in IoT Volumetric Attacks via Dynamic Monitoring of MUD Activity
Authors: Ayyoob Hamza, Hassan Habibi Gharakheili, Theophilus A. Benson, Gustavo Batista, Vijay Sivaraman
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.04987
Pdf link: https://arxiv.org/pdf/2304.04987
Abstract IoT networks are increasingly becoming target of sophisticated new cyber-attacks. Anomaly-based detection methods are promising in finding new attacks, but there are certain practical challenges like false-positive alarms, hard to explain, and difficult to scale cost-effectively. The IETF recent standard called Manufacturer Usage Description (MUD) seems promising to limit the attack surface on IoT devices by formally specifying their intended network behavior. In this paper, we use SDN to enforce and monitor the expected behaviors of each IoT device, and train one-class classifier models to detect volumetric attacks. Our specific contributions are fourfold. (1) We develop a multi-level inferencing model to dynamically detect anomalous patterns in network activity of MUD-compliant traffic flows via SDN telemetry, followed by packet inspection of anomalous flows. This provides enhanced fine-grained visibility into distributed and direct attacks, allowing us to precisely isolate volumetric attacks with microflow (5-tuple) resolution. (2) We collect traffic traces (benign and a variety of volumetric attacks) from network behavior of IoT devices in our lab, generate labeled datasets, and make them available to the public. (3) We prototype a full working system (modules are released as open-source), demonstrates its efficacy in detecting volumetric attacks on several consumer IoT devices with high accuracy while maintaining low false positives, and provides insights into cost and performance of our system. (4) We demonstrate how our models scale in environments with a large number of connected IoTs (with datasets collected from a network of IP cameras in our university campus) by considering various training strategies (per device unit versus per device type), and balancing the accuracy of prediction against the cost of models in terms of size and training time.
Bayes correlated equilibria and no-regret dynamics
Authors: Kaito Fujii
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05005
Pdf link: https://arxiv.org/pdf/2304.05005
Abstract This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information. We aim at three desirable properties of equilibria. First, equilibria can be naturally realized by introducing a mediator into games. Second, an equilibrium can be computed efficiently in a distributed fashion. Third, any equilibrium in that class approximately maximizes social welfare, as measured by the price of anarchy, for a broad class of games. These three properties allow players to compute an equilibrium and realize it via a mediator, thereby settling into a stable state with approximately optimal social welfare. Our main result is the existence of an equilibrium concept that satisfies these three properties. Toward this goal, we characterize various (non-equivalent) extensions of correlated equilibria, collectively known as Bayes correlated equilibria. In particular, we focus on communication equilibria (also known as coordination mechanisms), which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players. We show that if each player minimizes a variant of regret called untruthful swap regret in repeated play of Bayesian games, the empirical distribution of these dynamics converges to a communication equilibrium. We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant. As a result, by simulating the dynamics with our algorithm, we can efficiently compute an approximate communication equilibrium. Furthermore, we extend existing lower bounds on the price of anarchy based on the smoothness arguments from Bayes Nash equilibria to equilibria obtained by the proposed dynamics.
Translating Assembly Accuracy Requirements to Cut-Off Frequencies for Component Mode Synthesis
Authors: Lars A.L. Janssen, Bart Besselink, Rob H.B. Fey, Nathan van de Wouw
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05021
Pdf link: https://arxiv.org/pdf/2304.05021
Abstract One of the most popular methods for reducing the complexity of assemblies of finite element models in the field of structural dynamics is component mode synthesis. A main challenge of component mode synthesis is balancing model complexity and model accuracy, because it is difficult to predict how component reduction influences assembly model accuracy. This work introduces an approach that allows for the translation of assembly model accuracy requirements in the frequency domain to the automatic selection of the cut-off frequencies for the model-order reduction (MOR) of components. The approach is based on a mathematical approach for MOR for coupled linear systems in the field of systems and control. We show how this approach is also applicable to structural dynamics models. We demonstrate the use of this approach in the scope of component mode synthesis (CMS) methods with the aim to reduce the complexity of component models while guaranteeing accuracy requirements of the assembly model. The proposed approach is illustrated on a mechanical, three-component structural dynamics system for which reduced-order models are computed that are reduced further compared to reduction using standard methods. This results in lower simulation cost, while maintaining the required accuracy.
Soft Dynamic Time Warping for Multi-Pitch Estimation and Beyond
Authors: Michael Krause, Christof Weiß, Meinard Müller
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2304.05032
Pdf link: https://arxiv.org/pdf/2304.05032
Abstract Many tasks in music information retrieval (MIR) involve weakly aligned data, where exact temporal correspondences are unknown. The connectionist temporal classification (CTC) loss is a standard technique to learn feature representations based on weakly aligned training data. However, CTC is limited to discrete-valued target sequences and can be difficult to extend to multi-label problems. In this article, we show how soft dynamic time warping (SoftDTW), a differentiable variant of classical DTW, can be used as an alternative to CTC. Using multi-pitch estimation as an example scenario, we show that SoftDTW yields results on par with a state-of-the-art multi-label extension of CTC. In addition to being more elegant in terms of its algorithmic formulation, SoftDTW naturally extends to real-valued target sequences.
Real-Time Character Rise Motions
Authors: Ben Kenwright
Subjects: Robotics (cs.RO); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2304.05056
Pdf link: https://arxiv.org/pdf/2304.05056
Abstract This paper presents an uncomplicated dynamic controller for generating physically-plausible three-dimensional full-body biped character rise motions on-the-fly at run-time. Our low-dimensional controller uses fundamental reference information (e.g., center-of-mass, hands, and feet locations) to produce balanced biped get-up poses by means of a real-time physically-based simulation. The key idea is to use a simple approximate model (i.e., similar to the inverted-pendulum stepping model) to create continuous reference trajectories that can be seamlessly tracked by an articulated biped character to create balanced rise-motions. Our approach does not use any key-framed data or any computationally expensive processing (e.g., offline-optimization or search algorithms). We demonstrate the effectiveness and ease of our technique through example (i.e., a biped character picking itself up from different laying positions).
If consciousness is dynamically relevant, artificial intelligence isn't conscious
Authors: Johannes Kleiner, Tim Ludwig
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2304.05077
Pdf link: https://arxiv.org/pdf/2304.05077
Abstract We demonstrate that if consciousness is relevant for the temporal evolution of a system's states -- that is, if it is dynamically relevant -- then AI systems cannot be conscious. That is because AI systems run on CPUs, GPUs, TPUs or other processors which have been designed and verified to adhere to computational dynamics that systematically preclude or suppress deviations. The design and verification preclude or suppress, in particular, potential consciousness-related dynamical effects, so that if consciousness is dynamically relevant, AI systems cannot be conscious.
TodyNet: Temporal Dynamic Graph Neural Network for Multivariate Time Series Classification
Authors: Huaiyuan Liu, Xianzhang Liu, Donghua Yang, Zhiyu Liang, Hongzhi Wang, Yong Cui, Jun Gu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.05078
Pdf link: https://arxiv.org/pdf/2304.05078
Abstract Multivariate time series classification (MTSC) is an important data mining task, which can be effectively solved by popular deep learning technology. Unfortunately, the existing deep learning-based methods neglect the hidden dependencies in different dimensions and also rarely consider the unique dynamic features of time series, which lack sufficient feature extraction capability to obtain satisfactory classification accuracy. To address this problem, we propose a novel temporal dynamic graph neural network (TodyNet) that can extract hidden spatio-temporal dependencies without undefined graph structure. It enables information flow among isolated but implicit interdependent variables and captures the associations between different time slots by dynamic graph mechanism, which further improves the classification performance of the model. Meanwhile, the hierarchical representations of graphs cannot be learned due to the limitation of GNNs. Thus, we also design a temporal graph pooling layer to obtain a global graph-level representation for graph learning with learnable temporal parameters. The dynamic graph, graph information propagation, and temporal convolution are jointly learned in an end-to-end framework. The experiments on 26 UEA benchmark datasets illustrate that the proposed TodyNet outperforms existing deep learning-based methods in the MTSC tasks.
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Authors: Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05097
Pdf link: https://arxiv.org/pdf/2304.05097
Abstract Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/
Video Event Restoration Based on Keyframes for Video Anomaly Detection
Authors: Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, Xiaotao Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05112
Pdf link: https://arxiv.org/pdf/2304.05112
Abstract Video anomaly detection (VAD) is a significant computer vision problem. Existing deep neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or frame prediction. However, the lack of mining and learning of higher-level visual features and temporal context relationships in videos limits the further performance of these two approaches. Inspired by video codec theory, we introduce a brand-new VAD paradigm to break through these limitations: First, we propose a new task of video event restoration based on keyframes. Encouraging DNN to infer missing multiple frames based on video keyframes so as to restore a video event, which can more effectively motivate DNN to mine and learn potential higher-level visual features and comprehensive temporal context relationships in the video. To this end, we propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration, where a cross-attention and a temporal upsampling residual skip connection are introduced to further assist in restoring complex static and dynamic motion object features in the video. In addition, we propose a simple and effective adjacent frame difference loss to constrain the motion consistency of the video sequence. Extensive experiments on benchmarks demonstrate that USTN-DSC outperforms most existing methods, validating the effectiveness of our method.
Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing
Authors: Wei Lu, Nic A. Lee, Markus J. Buehler
Subjects: Machine Learning (cs.LG); Soft Condensed Matter (cond-mat.soft); Adaptation and Self-Organizing Systems (nlin.AO)
Arxiv link: https://arxiv.org/abs/2304.05137
Pdf link: https://arxiv.org/pdf/2304.05137
Abstract Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties.
Distributed Event-Triggered Online Learning for Multi-Agent System Control using Gaussian Process Regression
Authors: Xiaobing Dai, Zewen Yang, Mengtian Xu, Sandra Hirche
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05138
Pdf link: https://arxiv.org/pdf/2304.05138
Abstract For the cooperative control of multi-agent systems with unknown dynamics, data-driven methods are commonly employed to infer models from the collected data. Due to the flexibility to model nonlinear functions and the existence of theoretical prediction error bound, Gaussian process (GP) regression is widely used in such control problems. Online learning, i.e. adding newly collected training data to the GP models, promises to improve control performance via improved predictions during the operation. In this paper, we propose a distributed event-triggered online learning algorithm for multi-agent system control. The proposed algorithm only employs locally available information from the neighbors and achieves a guaranteed overall control performance with desired tracking error bound. Moreover, the exclusion of the Zeno behavior for each agent is proved. Finally, the effectiveness of the proposed event-triggered online learning is demonstrated in simulations.
Feed-forward Disturbance Compensation for Station Keeping in Wave-dominated Environments
Authors: Kyle L. Walker, Adam A. Stokes, Aristides Kiprakis, Francesco Giorgio-Serchi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2304.05222
Pdf link: https://arxiv.org/pdf/2304.05222
Abstract When deploying robots in shallow ocean waters, wave disturbances can be significant, highly dynamic and pose problems when operating near structures; this is a key limitation of current control strategies, restricting the range of conditions in which subsea vehicles can be deployed. To improve dynamic control and offer a higher level of robustness, this work proposes a Cascaded Proportional-Derivative (C-PD) with Feed-forward (FF) control scheme for disturbance mitigation, exploring the concept of explicitly using disturbance estimations to counteract state perturbations. Results demonstrate that the proposed controller is capable of higher performance in contrast to a standard C-PD controller, with an average reduction of ~48% witnessed across various sea states. Additional analysis also investigated performance when considering coarse estimations featuring inaccuracies; average improvements of ~17% demonstrate the effectiveness of the proposed strategy to handle these uncertainties. The proposal in this work shows promise for improved control without a drastic increase in required computing power; if coupled with sufficient sensors, state estimation techniques and prediction algorithms, utilising feed-forward compensating control actions offers a potential solution to improve vehicle control under wave-induced disturbances.
Neural Delay Differential Equations: System Reconstruction and Image Classification
Authors: Qunxi Zhu, Yao Guo, Wei Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Chaotic Dynamics (nlin.CD)
Arxiv link: https://arxiv.org/abs/2304.05310
Pdf link: https://arxiv.org/pdf/2304.05310
Abstract Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with representative datasets. Recently, an augmented framework has been developed to overcome some limitations that emerged in the application of the original framework. In this paper, we propose a new class of continuous-depth neural networks with delay, named Neural Delay Differential Equations (NDDEs). To compute the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Differential equations with delays are typically seen as dynamical systems of infinite dimension that possess more fruitful dynamics. Compared to NODEs, NDDEs have a stronger capacity of nonlinear representations. We use several illustrative examples to demonstrate this outstanding capacity. Firstly, we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected and even chaotic in a model-free or model-based manner. Traditional NODEs, without any argumentation, are not directly applicable for such modeling. Secondly, we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the CIFAR10, a well-known image dataset. Our results on the NDDEs demonstrate that appropriately articulating the elements of dynamical systems into the network design is truly beneficial in promoting network performance.
Stability/instability study of density systems and control law design
Authors: Igor Furtat
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2304.05313
Pdf link: https://arxiv.org/pdf/2304.05313
Abstract The paper considers some class of dynamical systems that called density systems. For such systems the derivative of quadratic function depends on so-called density function. The density function is used to set the properties of phase space, therefore, it influences the behaviour of investigated systems. A particular class of such systems is previously considered for (in)stability study of dynamical systems using the flow and divergence of a phase vector. In this paper, a more general class of such systems is considered, and it is shown that the density function can be used not only to study (in)stability, but also to set the properties of space in order to change the behaviour of dynamical systems. The development of control laws based on use the density function for systems with known and unknown parameters is considered. All obtained results are accompanied by the simulations illustrating the theoretical conclusions.
OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
Authors: Yunpeng Zhang, Zheng Zhu, Dalong Du
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2304.05316
Pdf link: https://arxiv.org/pdf/2304.05316
Abstract The vision-based perception for autonomous driving has undergone a transformation from the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the BEV planes, the 3D semantic occupancy further provides structural information along the vertical direction. This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. OccFormer achieves a long-range, dynamic, and efficient encoding of the camera-generated 3D voxel features. It is obtained by decomposing the heavy 3D processing into the local and global transformer pathways along the horizontal plane. For the occupancy decoder, we adapt the vanilla Mask2Former for 3D semantic occupancy by proposing preserve-pooling and class-guided sampling, which notably mitigate the sparsity and class imbalance. Experimental results demonstrate that OccFormer significantly outperforms existing methods for semantic scene completion on SemanticKITTI dataset and for LiDAR semantic segmentation on nuScenes dataset. Code is available at \url{https://github.com/zhangyp15/OccFormer}.
Unified Multi-Modal Image Synthesis for Missing Modality Imputation
Authors: Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S. Kevin Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2304.05340
Pdf link: https://arxiv.org/pdf/2304.05340
Abstract Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.
Distributed no-regret edge resource allocation with limited communication
Authors: Saad Kriouile, Dimitrios Tsilimantos, Theodoros Giannakas
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2304.05355
Pdf link: https://arxiv.org/pdf/2304.05355
Abstract To accommodate low latency and computation-intensive services, such as the Internet-of-Things (IoT), 5G networks are expected to have cloud and edge computing capabilities. To this end, we consider a generic network setup where devices, performing analytics-related tasks, can partially process a task and offload its remainder to base stations, which can then reroute it to cloud and/or to edge servers. To account for the potentially unpredictable traffic demands and edge network dynamics, we formulate the resource allocation as an online convex optimization problem with service violation constraints and allow limited communication between neighboring nodes. To address the problem, we propose an online distributed (across the nodes) primal-dual algorithm and prove that it achieves sublinear regret and violation; in fact, the achieved bound is of the same order as the best known centralized alternative. Our results are further supported using the publicly available Milano dataset.

A-suozhang / GetArxivDaily

New submissions for Wed, 12 Apr 23 #30

Keyword: efficient

DeepHive: A multi-agent reinforcement learning approach for automated discovery of swarm-based optimization policies

A new perspective on building efficient and expressive 3D equivariant graph neural networks

An autoencoder compression approach for accelerating large-scale inverse problems

Revisiting Test Time Adaptation under Online Evaluation

Scallop: A Language for Neurosymbolic Programming

Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

Binary Latent Diffusion

Exact Set-valued Estimation using Constrained Convex Generators for uncertain Linear Systems

A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital

Human Motion Detection Based on Dual-Graph and Weighted Nuclear Norm Regularizations

DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System

Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT

Model sparsification can simplify machine unlearning

Stress-hybrid virtual element method on quadrilateral meshes for compressible and nearly-incompressible linear elasticity

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Data-Efficient Image Quality Assessment with Attention-Panel Decoder

AROW: A V2X-based Automated Right-of-Way Algorithm for Distributed Cooperative Intersection Management

PlantDet: A benchmark for Plant Detection in the Three-Rivers-Source Region

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning

StageInteractor: Query-based Object Detector with Cross-stage Interaction

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories

Bayes correlated equilibria and no-regret dynamics

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Habits and goals in synergy: a variational Bayesian framework for behavior

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Human-machine cooperation for semantic feature listing

Scalable Real-Time Vehicle Deformation for Interactive Environments

Pointless Global Bundle Adjustment With Relative Motions Hessians

Accelerating Globally Optimal Consensus Maximization in Geometric Vision

From research activities to institutional piloting: the challenges of modernizing interfaces and data interoperability

TinyReptile: TinyML with Federated Meta-Learning

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

Inhomogeneous graph trend filtering via a l2,0 cardinality penalty

OpenAL: Evaluation and Interpretation of Active Learning Strategies

Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

Controllable Textual Inversion for Personalized Text-to-Image Generation

Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications

TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain

Leo: Lagrange Elementary Optimization

Astroformer: More Data Might Not be All You Need for Classification

Asymmetric Polynomial Loss For Multi-Label Classification

Design and Analysis of Index codes for 3-Group NOMA in Vehicular Adhoc Networks

Keyword: faster

Similarity search in the blink of an eye with compressed indices

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

Multi-Sample Consensus Driven Unsupervised Normal Estimation for 3D Point Clouds

Neural Network Predicts Ion Concentration Profiles under Nanoconfinement

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

Fast IMU-based Dual Estimation of Human Motion and Kinematic Parameters via Progressive In-Network Computing

PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices

flap: A Deterministic Parser with Fused Lexing

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training

SciKGTeX -- A LaTeX Package to Semantically Annotate Contributions in Scientific Publications

Keyword: mobile

Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People

MHfit: Mobile Health Data for Predicting Athletics Fitness Using Machine Learning

Bounding Box Annotation with Visible Status

Computer Vision-Aided Intelligent Monitoring of Coffee: Towards Sustainable Coffee Production

Measuring Teachers' Visual Expertise Using the Gaze Relational Index Based on Real-world Eye-tracking Data and Varying Velocity Thresholds

PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices

A user co-designed digital INtervention for Child LangUage DisordEr: The INCLUDE Project Protocol

Keyword: pruning

FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*

Design, Integration, and Field Evaluation of a Robotic Blossom Thinning System for Tree Fruit Crops

Model sparsification can simplify machine unlearning

Keyword: voxel

Weakly Supervised Intracranial Hemorrhage Segmentation using Head-Wise Gradient-Infused Self-Attention Maps from a Swin Transformer in Categorical Learning

EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Keyword: lidar