【CS-part2】New submissions for Mon, 8 Apr 24

Keyword: webgpu

There is no result

Keyword: webgl

There is no result

Keyword: pre-rendering

There is no result

Keyword: prerendering

There is no result

Keyword: motion prediction

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

Authors: Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03736
Pdf link: https://arxiv.org/pdf/2404.03736
Abstract Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions.
Keyword: incremental learning

There is no result

Keyword: svm incremental

There is no result

Keyword: nerf

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors: Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03736
Pdf link: https://arxiv.org/pdf/2404.03736
Abstract Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions.
Robust Gaussian Splatting
Authors: François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04211
Pdf link: https://arxiv.org/pdf/2404.04211
Abstract In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines.
Keyword: multiorgan

There is no result

Keyword: multi-organ

There is no result

Keyword: multi organ

There is no result

Keyword: SAM

Serial Parallel Reliability Redundancy Allocation Optimization for Energy Efficient and Fault Tolerant Cloud Computing
Authors: Gutha Jaya Krishna
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03665
Pdf link: https://arxiv.org/pdf/2404.03665
Abstract Serial-parallel redundancy is a reliable way to ensure service and systems will be available in cloud computing. That method involves making copies of the same system or program, with only one remaining active. When an error occurs, the inactive copy can step in as a backup right away, this provides continuous performance and uninterrupted operation. This approach is called parallel redundancy, otherwise known as active-active redundancy, and its exceptional when it comes to strategy. It creates duplicates of a system or service that are all running at once. By doing this fault tolerance increases since if one copy fails, the workload can be distributed across any replica thats functioning properly. Reliability allocation depends on features in a system and the availability and fault tolerance you want from it. Serial redundancy or parallel redundancies can be applied to increase the dependability of systems and services. To demonstrate how well this concept works, we looked into fixed serial parallel reliability redundancy allocation issues followed by using an innovative hybrid optimization technique to find the best possible allocation for peak dependability. We then measured our findings against other research.
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Authors: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03673
Pdf link: https://arxiv.org/pdf/2404.03673
Abstract Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL. Our framework, called Reinforcement Learning for Consistency Model (RLCM), frames the iterative inference process of a consistency model as an RL procedure. RLCM improves upon RL fine-tuned diffusion models on text-to-image generation capabilities and trades computation during inference time for sample quality. Experimentally, we show that RLCM can adapt text-to-image consistency models to objectives that are challenging to express with prompting, such as image compressibility, and those derived from human feedback, such as aesthetic quality. Comparing to RL finetuned diffusion models, RLCM trains significantly faster, improves the quality of the generation measured under the reward objectives, and speeds up the inference procedure by generating high quality images with as few as two inference steps. Our code is available at https://rlcm.owenoertell.com
Improve Knowledge Distillation via Label Revision and Data Selection
Authors: Weichao Lan, Yiu-ming Cheung, Qing Xu, Buhua Liu, Zhikai Hu, Mengke Li, Zhenghua Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03693
Pdf link: https://arxiv.org/pdf/2404.03693
Abstract Knowledge distillation (KD) has become a widely used technique in the field of model compression, which aims to transfer knowledge from a large teacher model to a lightweight student model for efficient network development. In addition to the supervision of ground truth, the vanilla KD method regards the predictions of the teacher as soft labels to supervise the training of the student model. Based on vanilla KD, various approaches have been developed to further improve the performance of the student model. However, few of these previous methods have considered the reliability of the supervision from teacher models. Supervision from erroneous predictions may mislead the training of the student model. This paper therefore proposes to tackle this problem from two aspects: Label Revision to rectify the incorrect supervision and Data Selection to select appropriate samples for distillation to reduce the impact of erroneous supervision. In the former, we propose to rectify the teacher's inaccurate predictions using the ground truth. In the latter, we introduce a data selection technique to choose suitable training samples to be supervised by the teacher, thereby reducing the impact of incorrect predictions to some extent. Experiment results demonstrate the effectiveness of our proposed method, and show that our method can be combined with other distillation approaches, improving their performance.
Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Authors: Spyridon Chavlis, Panayiota Poirazi
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2404.03708
Pdf link: https://arxiv.org/pdf/2404.03708
Abstract Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivity and restricted sampling properties of biological dendrites counteracts these limitations. We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks while using significantly fewer trainable parameters. This is achieved through the adoption of a different learning strategy, whereby most of the nodes respond to several classes, unlike classical ANNs that strive for class-specificity. These findings suggest that the incorporation of dendrites can make learning in ANNs precise, resilient, and parameter-efficient and shed new light on how biological features can impact the learning strategies of ANNs.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Authors: Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03736
Pdf link: https://arxiv.org/pdf/2404.03736
Abstract Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions.
Test Time Training for Industrial Anomaly Segmentation
Authors: Alex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03743
Pdf link: https://arxiv.org/pdf/2404.03743
Abstract Anomaly Detection and Segmentation (AD&S) is crucial for industrial quality control. While existing methods excel in generating anomaly scores for each pixel, practical applications require producing a binary segmentation to identify anomalies. Due to the absence of labeled anomalies in many real scenarios, standard practices binarize these maps based on some statistics derived from a validation set containing only nominal samples, resulting in poor segmentation performance. This paper addresses this problem by proposing a test time training strategy to improve the segmentation performance. Indeed, at test time, we can extract rich features directly from anomalous samples to train a classifier that can discriminate defects effectively. Our general approach can work downstream to any AD&S method that provides an anomaly score map as output, even in multimodal settings. We demonstrate the effectiveness of our approach over baselines through extensive experimentation and evaluation on MVTec AD and MVTec 3D-AD.
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers
Authors: Chunxiao Li, Charlie Liu, Jonathan Chung, Zhengyang (John)Lu, Piyush Jha, Vijay Ganesh
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03753
Pdf link: https://arxiv.org/pdf/2404.03753
Abstract Restart policy is an important technique used in modern Conflict-Driven Clause Learning (CDCL) solvers, wherein some parts of the solver state are erased at certain intervals during the run of the solver. In most solvers, variable activities are preserved across restart boundaries, resulting in solvers continuing to search parts of the assignment tree that are not far from the one immediately prior to a restart. To enable the solver to search possibly "distant" parts of the assignment tree, we study the effect of resets, a variant of restarts which not only erases the assignment trail, but also randomizes the activity scores of the variables of the input formula after reset, thus potentially enabling a better global exploration of the search space. In this paper, we model the problem of whether to trigger reset as a multi-armed bandit (MAB) problem, and propose two reinforcement learning (RL) based adaptive reset policies using the Upper Confidence Bound (UCB) and Thompson sampling algorithms. These two algorithms balance the exploration-exploitation tradeoff by adaptively choosing arms (reset vs. no reset) based on their estimated rewards during the solver's run. We implement our reset policies in four baseline SOTA CDCL solvers and compare the baselines against the reset versions on Satcoin benchmarks and SAT Competition instances. Our results show that RL-based reset versions outperform the corresponding baseline solvers on both Satcoin and the SAT competition instances, suggesting that our RL policy helps to dynamically and profitably adapt the reset frequency for any given input instance. We also introduce the concept of a partial reset, where at least a constant number of variable activities are retained across reset boundaries. Building on previous results, we show that there is an exponential separation between O(1) vs. $\Omega(n)$-length partial resets.
Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks
Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03761
Pdf link: https://arxiv.org/pdf/2404.03761
Abstract Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.
A Systems Theoretic Approach to Online Machine Learning
Authors: Anli du Preez, Peter A. Beling, Tyler Cody
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03775
Pdf link: https://arxiv.org/pdf/2404.03775
Abstract The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field of research and has been widely explored in terms of statistical theory and computational algorithms, however, in general, the literature still lacks formal system theoretical frameworks for modeling online learning systems and resolving systems-related concept drift issues. Furthermore, while the machine learning formulation serves to classify methods and literature, the systems theoretic formulation presented herein serves to provide a framework for the top-down design of online learning systems, including a novel definition of online learning and the identification of key design parameters. The framework is formulated in terms of input-output systems and is further divided into system structure and system behavior. Concept drift is a critical challenge faced in online learning, and this work formally approaches it as part of the system behavior characteristics. Healthcare provider fraud detection using machine learning is used as a case study throughout the paper to ground the discussion in a real-world online learning challenge.
Optimization of resources for digital radio transmission over IBOC FM through max-min fairness
Authors: Mónica Rico Martínez, Juan Carlos Vesga Ferreira, Joel Carroll Vargas, María Consuelo Rodríguez Niño, Andrés Alejandro Diaz Toro, William Alexander Cuevas Carrero
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2404.03795
Pdf link: https://arxiv.org/pdf/2404.03795
Abstract The equitable distribution of resources in a network is a complex process, considering that not all nodes have the same requirements, and the In-Band On-Channel (IBOC) hybrid transmission system is no exception. The IBOC system utilizes a hybrid in-band transmission to simultaneously broadcast analog and digital audio over the FM band. This article proposes the use of a Max-Min Fairness (MMF) algorithm, with a strategy to optimize resource allocation for IBOC FM transmission in a multiservice scenario. Additionally, the MMF algorithm offers low computational complexity for implementation in low-cost embedded systems, aiming to achieve fair resource distribution and provide adequate Quality of Service (QoS) levels for each node in the RF network, considering channel conditions and traffic types. The article explores a scenario under saturated traffic conditions to assess the optimization capabilities of the MMF algorithm under well-defined traffic and channel conditions. The evaluation process yielded highly favorable results, indicating that theMMF algorithm can be considered a viable alternative for bandwidth optimization in digital broadcasting over IBOC on FM with 95% confidence, and it holds potential for implementation in other digital broadcasting system.
Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03819
Pdf link: https://arxiv.org/pdf/2404.03819
Abstract Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.
The Low-Degree Hardness of Finding Large Independent Sets in Sparse Random Hypergraphs
Authors: Abhishek Dhawan, Yuzhou Wang
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03842
Pdf link: https://arxiv.org/pdf/2404.03842
Abstract We study the algorithmic task of finding large independent sets in Erdos-Renyi $r$-uniform hypergraphs on $n$ vertices having average degree $d$. Krivelevich and Sudakov showed that the maximum independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$. We show that the class of low-degree polynomial algorithms can find independent sets of density $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$ but no larger. This extends and generalizes earlier results of Gamarnik and Sudan, Rahman and Virag, and Wein on graphs, and answers a question of Bal and Bennett. We conjecture that this statistical-computational gap holds for this problem. Additionally, we explore the universality of this gap by examining $r$-partite hypergraphs. A hypergraph $H=(V,E)$ is $r$-partite if there is a partition $V=V_1\cup\cdots\cup V_r$ such that each edge contains exactly one vertex from each set $V_i$. We consider the problem of finding large balanced independent sets (independent sets containing the same number of vertices in each partition) in random $r$-partite hypergraphs with $n$ vertices in each partition and average degree $d$. We prove that the maximum balanced independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$ asymptotically. Furthermore, we prove an analogous low-degree computational threshold of $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$. Our results recover and generalize recent work of Perkins and the second author on bipartite graphs. While the graph case has been extensively studied, this work is the first to consider statistical-computational gaps of optimization problems on random hypergraphs. Our results suggest that these gaps persist for larger uniformities as well as across many models. A somewhat surprising aspect of the gap for balanced independent sets is that the algorithm achieving the lower bound is a simple degree-1 polynomial.
Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration
Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03869
Pdf link: https://arxiv.org/pdf/2404.03869
Abstract The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, real-world multi-agent systems usually contain agents with different functions and strategies, while the existing scalable MARL methods only have limited heterogeneity. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. we first leverage a latent network to adaptively learn strategy patterns for each agent. Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity at the same time. We implement our approach based on the state-of-the-art backbone PPO-based algorithm as SHPPO, while our approach is agnostic to the backbone and can be seamlessly plugged into any parameter-shared MARL method. SHPPO exhibits superior performance over the baselines such as MAPPO and HAPPO in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability and offering insights into the learned latent representation's impact on team performance by visualization.
Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning
Authors: Gawon Choi, Hyemin Ahn
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03891
Pdf link: https://arxiv.org/pdf/2404.03891
Abstract In robotics, the use of Large Language Models (LLMs) is becoming prevalent, especially for understanding human commands. In particular, LLMs are utilized as domain-agnostic task planners for high-level human commands. LLMs are capable of Chain-of-Thought (CoT) reasoning, and this allows LLMs to be task planners. However, we need to consider that modern robots still struggle to perform complex actions, and the domains where robots can be deployed are limited in practice. This leads us to pose a question: If small LMs can be trained to reason in chains within a single domain, would even small LMs be good task planners for the robots? To train smaller LMs to reason in chains, we build `COmmand-STeps datasets' (COST) consisting of high-level commands along with corresponding actionable low-level steps, via LLMs. We release not only our datasets but also the prompt templates used to generate them, to allow anyone to build datasets for their domain. We compare GPT3.5 and GPT4 with the finetuned GPT2 for task domains, in tabletop and kitchen environments, and the result shows that GPT2-medium is comparable to GPT3.5 for task planning in a specific domain. Our dataset, code, and more output samples can be found in https://github.com/Gawon-Choi/small-LMs-Task-Planning
SLSM : An Efficient Strategy for Lazy Schema Migration on Shared-Nothing Databases
Authors: Zhilin Zeng, Hui Li, Xiyue Gao, Hui Zhang, Huiquan Zhang, Jiangtao Cui
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2404.03929
Pdf link: https://arxiv.org/pdf/2404.03929
Abstract By introducing intermediate states for metadata changes and ensuring that at most two versions of metadata exist in the cluster at the same time, shared-nothing databases are capable of making online, asynchronous schema changes. However, this method leads to delays in the deployment of new schemas since it requires waiting for massive data backfill. To shorten the service vacuum period before the new schema is available, this paper proposes a strategy named SLSM for zero-downtime schema migration on shared-nothing databases. Based on the lazy migration of stand-alone databases, SLSM keeps the old and new schemas with the same data distribution, reducing the node communication overhead of executing migration transactions for shared-nothing databases. Further, SLSM combines migration transactions with user transactions by extending the distributed execution plan to allow the data involved in migration transactions to directly serve user transactions, greatly reducing the waiting time of user transactions. Experiments demonstrate that our strategy can greatly reduce the latency of user transactions and improve the efficiency of data migration compared to existing schemes.
Deep Learning for Satellite Image Time Series Analysis: A Review
Authors: Lynn Miller, Charlotte Pelletier, Geoffrey I. Webb
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.03936
Pdf link: https://arxiv.org/pdf/2404.03936
Abstract Earth observation (EO) satellite missions have been providing detailed images about the state of the Earth and its land cover for over 50 years. Long term missions, such as NASA's Landsat, Terra, and Aqua satellites, and more recently, the ESA's Sentinel missions, record images of the entire world every few days. Although single images provide point-in-time data, repeated images of the same area, or satellite image time series (SITS) provide information about the changing state of vegetation and land use. These SITS are useful for modeling dynamic processes and seasonal changes such as plant phenology. They have potential benefits for many aspects of land and natural resource management, including applications in agricultural, forest, water, and disaster management, urban planning, and mining. However, the resulting satellite image time series (SITS) are complex, incorporating information from the temporal, spatial, and spectral dimensions. Therefore, deep learning methods are often deployed as they can analyze these complex relationships. This review presents a summary of the state-of-the-art methods of modelling environmental, agricultural, and other Earth observation variables from SITS data using deep learning methods. We aim to provide a resource for remote sensing experts interested in using deep learning techniques to enhance Earth observation models with temporal information.
Stability in Graphs with Matroid Constraints
Authors: Fedor V. Fomin, Petr A. Golovach, Tuukka Korhonen, Saket Saurabh
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2404.03979
Pdf link: https://arxiv.org/pdf/2404.03979
Abstract We study the following Independent Stable Set problem. Let G be an undirected graph and M = (V(G),I) be a matroid whose elements are the vertices of G. For an integer k\geq 1, the task is to decide whether G contains a set S\subseteq V(G) of size at least k which is independent (stable) in G and independent in M. This problem generalizes several well-studied algorithmic problems, including Rainbow Independent Set, Rainbow Matching, and Bipartite Matching with Separation. We show that - When the matroid M is represented by the independence oracle, then for any computable function f, no algorithm can solve Independent Stable Set using f(k)n^{o(k)} calls to the oracle. - On the other hand, when the graph G is of degeneracy d, then the problem is solvable in time O((d+1)^kn), and hence is FPT parameterized by d+k. Moreover, when the degeneracy d is a constant (which is not a part of the input), the problem admits a kernel polynomial in k. More precisely, we prove that for every integer d\geq 0, the problem admits a kernelization algorithm that in time n^{O(d)} outputs an equivalent framework with a graph on dk^{O(d)} vertices. A lower bound complements this when d is part of the input: Independent Stable Set does not admit a polynomial kernel when parameterized by k+d unless NP \subseteq coNP/poly. This lower bound holds even when M is a partition matroid. - Another set of results concerns the scenario when the graph G is chordal. In this case, our computational lower bound excludes an FPT algorithm when the input matroid is given by its independence oracle. However, we demonstrate that Independent Stable Set can be solved in 2^{O(k)}||M||^{O(1)} time when M is a linear matroid given by its representation. In the same setting, Independent Stable Set does not have a polynomial kernel when parameterized by k unless NP\subseteq coNP/poly.
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study
Authors: Myrthe Reuver, Suzan Verberne, Antske Fokkens
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03987
Pdf link: https://arxiv.org/pdf/2404.03987
Abstract For a viewpoint-diverse news recommender, identifying whether two news articles express the same viewpoint is essential. One way to determine "same or different" viewpoint is stance detection. In this paper, we investigate the robustness of operationalization choices for few-shot stance detection, with special attention to modelling stance across different topics. Our experiments test pre-registered hypotheses on stance detection. Specifically, we compare two stance task definitions (Pro/Con versus Same Side Stance), two LLM architectures (bi-encoding versus cross-encoding), and adding Natural Language Inference knowledge, with pre-trained RoBERTa models trained with shots of 100 examples from 7 different stance detection datasets. Some of our hypotheses and claims from earlier work can be confirmed, while others give more inconsistent results. The effect of the Same Side Stance definition on performance differs per dataset and is influenced by other modelling choices. We found no relationship between the number of training topics in the training shots and performance. In general, cross-encoding out-performs bi-encoding, and adding NLI training to our models gives considerable improvement, but these results are not consistent across all datasets. Our results indicate that it is essential to include multiple datasets and systematic modelling experiments when aiming to find robust modelling choices for the concept `stance'.
Demonstration Guided Multi-Objective Reinforcement Learning
Authors: Junlin Lu, Patrick Mannion, Karl Mason
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03997
Pdf link: https://arxiv.org/pdf/2404.03997
Abstract Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.
Approximate UMAP allows for high-rate online visualization of high-dimensional data streams
Authors: Peter Wassenaar, Pierre Guetschel, Michael Tangermann
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.04001
Pdf link: https://arxiv.org/pdf/2404.04001
Abstract In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.
Towards Safe Robot Use with Edged or Pointed Objects: A Surrogate Study Assembling a Human Hand Injury Protection Database
Authors: Robin Jeanne Kirschner, Carina M. Micheler, Yangcan Zhou, Sebastian Siegner, Mazin Hamad, Claudio Glowalla, Jan Neumann, Nader Rajaei, Rainer Burgkart, Sami Haddadin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.04004
Pdf link: https://arxiv.org/pdf/2404.04004
Abstract The use of pointed or edged tools or objects is one of the most challenging aspects of today's application of physical human-robot interaction (pHRI). One reason for this is that the severity of harm caused by such edged or pointed impactors is less well studied than for blunt impactors. Consequently, the standards specify well-reasoned force and pressure thresholds for blunt impactors and advise avoiding any edges and corners in contacts. Nevertheless, pointed or edged impactor geometries cannot be completely ruled out in real pHRI applications. For example, to allow edged or pointed tools such as screwdrivers near human operators, the knowledge of injury severity needs to be extended so that robot integrators can perform well-reasoned, time-efficient risk assessments. In this paper, we provide the initial datasets on injury prevention for the human hand based on drop tests with surrogates for the human hand, namely pig claws and chicken drumsticks. We then demonstrate the ease and efficiency of robot use using the dataset for contact on two examples. Finally, our experiments provide a set of injuries that may also be expected for human subjects under certain robot mass-velocity constellations in collisions. To extend this work, testing on human samples and a collaborative effort from research institutes worldwide is needed to create a comprehensive human injury avoidance database for any pHRI scenario and thus for safe pHRI applications including edged and pointed geometries.
A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement
Authors: Annerose Eichel, Sabine Schulte im Walde
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.04035
Pdf link: https://arxiv.org/pdf/2404.04035
Abstract We present a novel dataset for physical and abstract plausibility of events in English. Based on naturally occurring sentences extracted from Wikipedia, we infiltrate degrees of abstractness, and automatically generate perturbed pseudo-implausible events. We annotate a filtered and balanced subset for plausibility using crowd-sourcing, and perform extensive cleansing to ensure annotation quality. In-depth quantitative analyses indicate that annotators favor plausibility over implausibility and disagree more on implausible events. Furthermore, our plausibility dataset is the first to capture abstractness in events to the same extent as concreteness, and we find that event abstractness has an impact on plausibility ratings: more concrete event participants trigger a perception of implausibility.
InstructHumans: Editing Animated 3D Human Textures with Instructions
Authors: Jiayin Zhu, Linlin Yang, Angela Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2404.04037
Pdf link: https://arxiv.org/pdf/2404.04037
Abstract We present InstructHumans, a novel framework for instruction-driven 3D human texture editing. Existing text-based editing methods use Score Distillation Sampling (SDS) to distill guidance from generative models. This work shows that naively using such scores is harmful to editing as they destroy consistency with the source avatar. Instead, we propose an alternate SDS for Editing (SDS-E) that selectively incorporates subterms of SDS across diffusion timesteps. We further enhance SDS-E with spatial smoothness regularization and gradient-based viewpoint sampling to achieve high-quality edits with sharp and high-fidelity detailing. InstructHumans significantly outperforms existing 3D editing methods, consistent with the initial avatar while faithful to the textual instructions. Project page: https://jyzhu.top/instruct-humans .
Refutability as Recursive as Provability
Authors: Paola Cattabriga
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
Arxiv link: https://arxiv.org/abs/2404.04038
Pdf link: https://arxiv.org/pdf/2404.04038
Abstract Godel numbering is an arithmetization of sintax which defines provability by coding a primitive recursive predicate, Pf(x,v). A multiplicity of researches and results all around this well-known recursive predicate are today widespread in many areas of logic and AI. Not equally investigated is the refutability predicate defined by Godel numbering within the same primitive recursive status. Rf(x,v) can be defined as a recursive predicate meaning that x is the Godel number of a refutation in PA of the formula with Godel number v. This article proposes a logical investigation of the interactive links between provability and refutability predicates when defined within the same recursive status. The resulting Lemmas are clarifying and open new perspectives for the incompleteness argument and the codings of its underlying notions.
A single shooting method with approximate Fréchet derivative for computing geodesics on the Stiefel manifold
Authors: Marco Sutti
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.04089
Pdf link: https://arxiv.org/pdf/2404.04089
Abstract This paper shows how to use the shooting method, a classical numerical algorithm for solving boundary value problems, to compute the Riemannian distance on the Stiefel manifold $ \mathrm{St}(n,p) $, the set of $ n \times p $ matrices with orthonormal columns. The proposed method is a shooting method in the sense of the classical shooting methods for solving boundary value problems; see, e.g., Stoer and Bulirsch, 1991. The main feature is that we provide an approximate formula for the Fr\'{e}chet derivative of the geodesic involved in our shooting method. Numerical experiments demonstrate the algorithms' accuracy and performance. Comparisons with existing state-of-the-art algorithms for solving the same problem show that our method is competitive and even beats several algorithms in many cases.
Robust Preference Optimization with Provable Noise Tolerance for LLMs
Authors: Xize Liang, Chao Chen, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.04102
Pdf link: https://arxiv.org/pdf/2404.04102
Abstract The preference alignment aims to enable large language models (LLMs) to generate responses that conform to human values, which is essential for developing general AI systems. Ranking-based methods -- a promising class of alignment approaches -- learn human preferences from datasets containing response pairs by optimizing the log-likelihood margins between preferred and dis-preferred responses. However, due to the inherent differences in annotators' preferences, ranking labels of comparisons for response pairs are unavoidably noisy. This seriously hurts the reliability of existing ranking-based methods. To address this problem, we propose a provably noise-tolerant preference alignment method, namely RObust Preference Optimization (ROPO). To the best of our knowledge, ROPO is the first preference alignment method with noise-tolerance guarantees. The key idea of ROPO is to dynamically assign conservative gradient weights to response pairs with high label uncertainty, based on the log-likelihood margins between the responses. By effectively suppressing the gradients of noisy samples, our weighting strategy ensures that the expected risk has the same gradient direction independent of the presence and proportion of noise. Experiments on three open-ended text generation tasks with four base models ranging in size from 2.8B to 13B demonstrate that ROPO significantly outperforms existing ranking-based methods.
3D Facial Expressions through Analysis-by-Neural-Synthesis
Authors: George Retsinas, Panagiotis P. Filntisis, Radek Danecek, Victoria F. Abrevaya, Anastasios Roussos, Timo Bolkart, Petros Maragos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04104
Pdf link: https://arxiv.org/pdf/2404.04104
Abstract While existing methods for 3D face reconstruction from in-the-wild images excel at recovering the overall face shape, they commonly miss subtle, extreme, asymmetric, or rarely observed expressions. We improve upon these methods with SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics), which faithfully reconstructs expressive 3D faces from images. We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images. For training, most methods employ differentiable rendering to compare a predicted face mesh with the input image, along with a plethora of additional loss functions. This differentiable rendering loss not only has to provide supervision to optimize for 3D face geometry, camera, albedo, and lighting, which is an ill-posed optimization problem, but the domain gap between rendering and input image further hinders the learning process. Instead, SMIRK replaces the differentiable rendering with a neural rendering module that, given the rendered predicted mesh geometry, and sparsely sampled pixels of the input image, generates a face image. As the neural rendering gets color information from sampled image pixels, supervising with neural rendering-based reconstruction loss can focus solely on the geometry. Further, it enables us to generate images of the input identity with varying expressions while training. These are then utilized as input to the reconstruction model and used as supervision with ground truth geometry. This effectively augments the training data and enhances the generalization for diverse expressions. Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction. Project webpage: https://georgeretsi.github.io/smirk/.
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
Authors: Yidong Gong, Pradeep Kumar
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.04118
Pdf link: https://arxiv.org/pdf/2404.04118
Abstract We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.
Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification
Authors: Rui Wang, Chuanfu Shen, Manuel J. Marin-Jimenez, George Q. Huang, Shiqi Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04120
Pdf link: https://arxiv.org/pdf/2404.04120
Abstract Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Authors: Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04125
Pdf link: https://arxiv.org/pdf/2404.04125
Abstract Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector
Authors: Junbo Li, Keyan Chen, Gengju Tian, Lu Li, Zhenwei Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04155
Pdf link: https://arxiv.org/pdf/2404.04155
Abstract The segmentation and interpretation of the Martian surface play a pivotal role in Mars exploration, providing essential data for the trajectory planning and obstacle avoidance of rovers. However, the complex topography, similar surface features, and the lack of extensive annotated data pose significant challenges to the high-precision semantic segmentation of the Martian surface. To address these challenges, we propose a novel encoder-decoder based Mars segmentation network, termed MarsSeg. Specifically, we employ an encoder-decoder structure with a minimized number of down-sampling layers to preserve local details. To facilitate a high-level semantic understanding across the shadow multi-level feature maps, we introduce a feature enhancement connection layer situated between the encoder and decoder. This layer incorporates Mini Atrous Spatial Pyramid Pooling (Mini-ASPP), Polarized Self-Attention (PSA), and Strip Pyramid Pooling Module (SPPM). The Mini-ASPP and PSA are specifically designed for shadow feature enhancement, thereby enabling the expression of local details and small objects. Conversely, the SPPM is employed for deep feature enhancement, facilitating the extraction of high-level semantic category-related information. Experimental results derived from the Mars-Seg and AI4Mars datasets substantiate that the proposed MarsSeg outperforms other state-of-the-art methods in segmentation performance, validating the efficacy of each proposed component.
Torque-Minimizing Control Allocation for Overactuated Quadrupedal Locomotion
Authors: Mads Erlend Bøe Lysø, Esten Ingar Grøtli, Kristin Ytterstad Pettersen
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.04156
Pdf link: https://arxiv.org/pdf/2404.04156
Abstract In this paper, we improve upon a method for optimal control of quadrupedal robots which utilizes a full-order model of the system. The original method utilizes offline nonlinear optimal control to synthesize a control scheme which exponentially orbitally stabilizes the closed-loop system. However, it is not able to handle the overactuated phases which frequently occur during quadrupedal locomotion as a result of the multi-contact nature of the system. We propose a modified method, which handles overactuated gait phases in a way that utilizes the full range of available actuators to minimize torque expenditure without requiring output trajectories to be modified. It is shown that the system under the proposed controller exhibits the same properties, i.e. exponential orbital stability, with the same or lower point-wise torque magnitude. A simulation study demonstrates that the reduction in torque may in certain cases be substantial.
Exploring Probabilistic Models for Semi-supervised Learning
Authors: Jianfeng Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04199
Pdf link: https://arxiv.org/pdf/2404.04199
Abstract This thesis studies advanced probabilistic models, including both their theoretical foundations and practical applications, for different semi-supervised learning (SSL) tasks. The proposed probabilistic methods are able to improve the safety of AI systems in real applications by providing reliable uncertainty estimates quickly, and at the same time, achieve competitive performance compared to their deterministic counterparts. The experimental results indicate that the methods proposed in the thesis have great value in safety-critical areas, such as the autonomous driving or medical imaging analysis domain, and pave the way for the future discovery of highly effective and efficient probabilistic approaches in the SSL sector.
V-Star: Learning Visibly Pushdown Grammars from Program Inputs
Authors: Xiaodong Jia, Gang Tan
Subjects: Programming Languages (cs.PL); Formal Languages and Automata Theory (cs.FL)
Arxiv link: https://arxiv.org/abs/2404.04201
Pdf link: https://arxiv.org/pdf/2404.04201
Abstract Accurate description of program inputs remains a critical challenge in the field of programming languages. Active learning, as a well-established field, achieves exact learning for regular languages. We offer an innovative grammar inference tool, V-Star, based on the active learning of visibly pushdown automata. V-Star deduces nesting structures of program input languages from sample inputs, employing a novel inference mechanism based on nested patterns. This mechanism identifies token boundaries and converts languages such as XML documents into VPLs. We then adapted Angluin's L-Star, an exact learning algorithm, for VPA learning, which improves the precision of our tool. Our evaluation demonstrates that V-Star effectively and efficiently learns a variety of practical grammars, including S-Expressions, JSON, and XML, and outperforms other state-of-the-art tools.
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions
Authors: Zachary R. Fox, Ayana Ghosh
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2404.04224
Pdf link: https://arxiv.org/pdf/2404.04224
Abstract Predicting and enhancing inherent properties based on molecular structures is paramount to design tasks in medicine, materials science, and environmental management. Most of the current machine learning and deep learning approaches have become standard for predictions, but they face challenges when applied across different datasets due to reliance on correlations between molecular representation and target properties. These approaches typically depend on large datasets to capture the diversity within the chemical space, facilitating a more accurate approximation, interpolation, or extrapolation of the chemical behavior of molecules. In our research, we introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling with the use of a graph loss function. This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space. The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously. While our implementation focused on the QM9 quantum-chemical dataset for a specific design task-finding molecules with a large dipole moment-our active causal learning approach, driven by intelligent sampling and interventions, holds potential for broader applications in molecular, materials design and discovery.

Yukeaaa / arxiv-daily

【CS-part2】New submissions for Mon, 8 Apr 24 #1357

Keyword: webgpu

Keyword: webgl

Keyword: pre-rendering

Keyword: prerendering

Keyword: motion prediction

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

Keyword: incremental learning

Keyword: svm incremental

Keyword: nerf

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

Robust Gaussian Splatting

Keyword: multiorgan

Keyword: multi-organ

Keyword: multi organ

Keyword: SAM

Serial Parallel Reliability Redundancy Allocation Optimization for Energy Efficient and Fault Tolerant Cloud Computing

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Improve Knowledge Distillation via Label Revision and Data Selection

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

Test Time Training for Industrial Anomaly Segmentation

A Reinforcement Learning based Reset Policy for CDCL SAT Solvers

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

A Systems Theoretic Approach to Online Machine Learning

Optimization of resources for digital radio transmission over IBOC FM through max-min fairness

Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

The Low-Degree Hardness of Finding Large Independent Sets in Sparse Random Hypergraphs

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Can only LLMs do Reasoning?: Potential of Small Language Models in Task Planning

SLSM : An Efficient Strategy for Lazy Schema Migration on Shared-Nothing Databases

Deep Learning for Satellite Image Time Series Analysis: A Review

Stability in Graphs with Matroid Constraints

Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered Study

Demonstration Guided Multi-Objective Reinforcement Learning

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Towards Safe Robot Use with Edged or Pointed Objects: A Surrogate Study Assembling a Human Hand Injury Protection Database

A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

InstructHumans: Editing Animated 3D Human Textures with Instructions

Refutability as Recursive as Provability

A single shooting method with approximate Fréchet derivative for computing geodesics on the Stiefel manifold

Robust Preference Optimization with Provable Noise Tolerance for LLMs

3D Facial Expressions through Analysis-by-Neural-Synthesis

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector

Torque-Minimizing Control Allocation for Overactuated Quadrupedal Locomotion

Exploring Probabilistic Models for Semi-supervised Learning

V-Star: Learning Visibly Pushdown Grammars from Program Inputs

Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions