New submissions for Fri, 5 May 23

Keyword: efficient

MaskSearch: Querying Image Masks at Scale

Authors: Dong He, Jieyu Zhang, Maureen Daum, Alexander Ratner, Magdalena Balazinska
Subjects: Databases (cs.DB); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2305.02375
Pdf link: https://arxiv.org/pdf/2305.02375
Abstract Machine learning tasks over image databases often generate masks that annotate image content (e.g., saliency maps, segmentation maps) and enable a variety of applications (e.g., determine if a model is learning spurious correlations or if an image was maliciously modified to mislead a model). While queries that retrieve examples based on mask properties are valuable to practitioners, existing systems do not support such queries efficiently. In this paper, we formalize the problem and propose a system, MaskSearch, that focuses on accelerating queries over databases of image masks. MaskSearch leverages a novel indexing technique and an efficient filter-verification query execution framework. Experiments on real-world datasets with our prototype show that MaskSearch, using indexes approximately 5% the size of the data, accelerates individual queries by up to two orders of magnitude and consistently outperforms existing methods on various multi-query workloads that simulate dataset exploration and analysis processes.
Privacy in Population Protocols with Probabilistic Scheduling
Authors: Talley Amir, James Aspnes
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2305.02377
Pdf link: https://arxiv.org/pdf/2305.02377
Abstract The population protocol model introduced by Angluin et al. in 2006 offers a theoretical framework for designing and analyzing distributed algorithms among limited-resource mobile agents. While the original population protocol model considers the concept of anonymity, the issue of privacy is not investigated thoroughly. However, there is a need for time- and space-efficient privacy-preserving techniques in the population protocol model if these algorithms are to be implemented in settings handling sensitive data, such as sensor networks, IoT devices, and drones. In this work, we introduce several formal definitions of privacy, ranging from assuring only plausible deniability of the population input vector to having a full information-theoretic guarantee that knowledge beyond an agent's input and output bear no influence on the probability of a particular input vector. We then apply these definitions to both existing and novel protocols. We show that the Remainder-computing protocol given by Delporte-Gallet et al. in 2007 (which is proven to satisfy output independent privacy under adversarial scheduling) is not information-theoretically private under probabilistic scheduling. In contrast, we provide a new algorithm and demonstrate that it correctly and information-theoretically privately computes Remainder under probabilistic scheduling.
Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics
Authors: Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe
Subjects: Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.02381
Pdf link: https://arxiv.org/pdf/2305.02381
Abstract The analysis of large-scale time-series network data, such as social media and email communications, remains a significant challenge for graph analysis methodology. In particular, the scalability of graph analysis is a critical issue hindering further progress in large-scale downstream inference. In this paper, we introduce a novel approach called "temporal encoder embedding" that can efficiently embed large amounts of graph data with linear complexity. We apply this method to an anonymized time-series communication network from a large organization spanning 2019-2020, consisting of over 100 thousand vertices and 80 million edges. Our method embeds the data within 10 seconds on a standard computer and enables the detection of communication pattern shifts for individual vertices, vertex communities, and the overall graph structure. Through supporting theory and synthesis studies, we demonstrate the theoretical soundness of our approach under random graph models and its numerical effectiveness through simulation studies.
ADPDM: Accelerating Distributed Pointer-Traversals on Disaggregated Memory
Authors: Yupeng Tang, Seung-seob Lee, Anurag Khandelwal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2305.02388
Pdf link: https://arxiv.org/pdf/2305.02388
Abstract Caches at CPU nodes in disaggregated memory architectures amortize the high data access latency over the network. However, such caches are fundamentally unable to improve performance for workloads requiring pointer traversals across linked data structures. We argue for accelerating these pointer traversals closer to disaggregated memory, in a manner that preserves expressiveness for supporting various linked structures, ensures energy efficiency and performance, and supports distributed execution. We design ADPDM to meet all the above requirements for pointer-traversal workloads on rack-scale disaggregated memory through the principled use of FPGAbased SmartNICs and programmable network switches. Our evaluation of ADPDM shows that it enables low-latency, highthroughput, and energy-efficient execution for a wide range of common pointer traversal workloads on disaggregated memory that fare poorly with caching alone.
Defending against Insertion-based Textual Backdoor Attacks via Attribution
Authors: Jiazhao Li, Zhuofeng Wu, Wei Ping, Chaowei Xiao, V.G. Vinod Vydiswaran
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02394
Pdf link: https://arxiv.org/pdf/2305.02394
Abstract Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks, BadNL and InSent. Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results and therefore are more likely to be poison triggers. Additionally, we further utilize an external pre-trained language model to distinguish whether input is poisoned or not. We show that our proposed method can generalize sufficiently well in two common attack scenarios (poisoning training data and testing data), which consistently improves previous methods. For instance, AttDef can successfully mitigate both attacks with an average accuracy of 79.97% (56.59% up) and 48.34% (3.99% up) under pre-training and post-training attack defense respectively, achieving the new state-of-the-art performance on prediction recovery over four benchmark datasets.
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Authors: Deepak Narayanan, Keshav Santhanam, Peter Henderson, Rishi Bommasani, Tony Lee, Percy Liang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02440
Pdf link: https://arxiv.org/pdf/2305.02440
Abstract Large language models (LLMs) power many state-of-the-art systems in natural language processing. However, these models are extremely computationally expensive, even at inference time, raising the natural question: when is the extra cost of deploying a larger model worth the anticipated boost in capabilities? Better understanding this tradeoff fundamentally could benefit from an inference efficiency metric that is both (i) easily comparable across models from different providers, and (ii) representative of the true cost of running queries in an isolated performance environment. Unfortunately, access to LLMs today is largely restricted to black-box text generation APIs and raw runtimes measured through this interface do not satisfy these desiderata: model providers can apply various software and hardware optimizations orthogonal to the model, and models served on shared infrastructure are susceptible to performance contention. To circumvent these problems, we propose a new metric for comparing inference efficiency across models. This metric puts models on equal footing as though they were served (i) on uniform hardware and software, and (ii) without performance contention. We call this metric the \emph{idealized runtime}, and we propose a methodology to efficiently estimate this metric for autoregressive Transformer models. We also propose cost-aware variants that incorporate the number of accelerators needed to serve the model. Using these metrics, we compare ten state-of-the-art LLMs to provide the first analysis of inference efficiency-capability tradeoffs; we make several observations from this analysis, including the fact that the superior inference runtime performance of certain APIs is often a byproduct of optimizations within the API rather than the underlying model. Our methodology also facilitates the efficient comparison of different software and hardware stacks.
Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks
Authors: Sara Riva, Jean-Marie Lagniez, Gustavo Magaña López, Loïc Paulevé
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Systems and Control (eess.SY); Molecular Networks (q-bio.MN)
Arxiv link: https://arxiv.org/abs/2305.02442
Pdf link: https://arxiv.org/pdf/2305.02442
Abstract Minimal trap spaces (MTSs) capture subspaces in which the Boolean dynamics is trapped, whatever the update mode. They correspond to the attractors of the most permissive mode. Due to their versatility, the computation of MTSs has recently gained traction, essentially by focusing on their enumeration. In this paper, we address the logical reasoning on universal properties of MTSs in the scope of two problems: the reprogramming of Boolean networks for identifying the permanent freeze of Boolean variables that enforce a given property on all the MTSs, and the synthesis of Boolean networks from universal properties on their MTSs. Both problems reduce to solving the satisfiability of quantified propositional logic formula with 3 levels of quantifiers ($\exists\forall\exists$). In this paper, we introduce a Counter-Example Guided Refinement Abstraction (CEGAR) to efficiently solve these problems by coupling the resolution of two simpler formulas. We provide a prototype relying on Answer-Set Programming for each formula and show its tractability on a wide range of Boolean models of biological networks.
Bayesian Safety Validation for Black-Box Systems
Authors: Robert J. Moss, Mykel J. Kochenderfer, Maxime Gariel, Arthur Dubois
Subjects: Machine Learning (cs.LG); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2305.02449
Pdf link: https://arxiv.org/pdf/2305.02449
Abstract Accurately estimating the probability of failure for safety-critical systems is important for certification. Estimation is often challenging due to high-dimensional input spaces, dangerous test scenarios, and computationally expensive simulators; thus, efficient estimation techniques are important to study. This work reframes the problem of black-box safety validation as a Bayesian optimization problem and introduces an algorithm, Bayesian safety validation, that iteratively fits a probabilistic surrogate model to efficiently predict failures. The algorithm is designed to search for failures, compute the most-likely failure, and estimate the failure probability over an operating domain using importance sampling. We introduce a set of three acquisition functions that focus on reducing uncertainty by covering the design space, optimizing the analytically derived failure boundaries, and sampling the predicted failure regions. Mainly concerned with systems that only output a binary indication of failure, we show that our method also works well in cases where more output information is available. Results show that Bayesian safety validation achieves a better estimate of the probability of failure using orders of magnitude fewer samples and performs well across various safety validation metrics. We demonstrate the algorithm on three test problems with access to ground truth and on a real-world safety-critical subsystem common in autonomous flight: a neural network-based runway detection system. This work is open sourced and currently being used to supplement the FAA certification process of the machine learning components for an autonomous cargo aircraft.
Perfect Sampling for Hard Spheres from Strong Spatial Mixing
Authors: Konrad Anand, Andreas Göbel, Marcus Pappik, Will Perkins
Subjects: Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2305.02450
Pdf link: https://arxiv.org/pdf/2305.02450
Abstract We provide a perfect sampling algorithm for the hard-sphere model on subsets of $\mathbb{R}^d$ with expected running time linear in the volume under the assumption of strong spatial mixing. A large number of perfect and approximate sampling algorithms have been devised to sample from the hard-sphere model, and our perfect sampling algorithm is efficient for a range of parameters for which only efficient approximate samplers were previously known and is faster than these known approximate approaches. Our methods also extend to the more general setting of Gibbs point processes interacting via finite-range, repulsive potentials.
MLHOps: Machine Learning for Healthcare Operations
Authors: Faiza Khan Khattak, Vallijah Subasri, Amrit Krishnan, Elham Dolatabadi, Deval Pandya, Laleh Seyyed-Kalantari, Frank Rudzicz
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02474
Pdf link: https://arxiv.org/pdf/2305.02474
Abstract Machine Learning Health Operations (MLHOps) is the combination of processes for reliable, efficient, usable, and ethical deployment and maintenance of machine learning models in healthcare settings. This paper provides both a survey of work in this area and guidelines for developers and clinicians to deploy and maintain their own models in clinical practice. We cover the foundational concepts of general machine learning operations, describe the initial setup of MLHOps pipelines (including data sources, preparation, engineering, and tools). We then describe long-term monitoring and updating (including data distribution shifts and model updating) and ethical considerations (including bias, fairness, interpretability, and privacy). This work therefore provides guidance across the full pipeline of MLHOps from conception to initial and ongoing deployment.
Generalizing Frobenius Inversion to Quaternion Matrices
Authors: Qiyuan Chen, J. Uhlmann, Ke Ye
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.02477
Pdf link: https://arxiv.org/pdf/2305.02477
Abstract In this paper we derive and analyze an algorithm for inverting quaternion matrices. The algorithm is an analogue of the Frobenius algorithm for the complex matrix inversion. On the theory side, we prove that our algorithm is more efficient that other existing methods. Moreover, our algorithm is optimal in the sense of the least number of complex inversions. On the practice side, our algorithm outperforms existing algorithms on randomly generated matrices. We argue that this algorithm can be used to improve the practical utility of recursive Strassen-type algorithms by providing the fastest possible base case for the recursive decomposition process when applied to quaternion matrices.
A Deterministic Construction of a Large Distance Code from the Wozencraft Ensemble
Authors: Venkatesan Guruswami, Shilun Li
Subjects: Information Theory (cs.IT); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2305.02484
Pdf link: https://arxiv.org/pdf/2305.02484
Abstract We present an explicit construction of a sequence of rate $1/2$ Wozencraft ensemble codes (over any fixed finite field $\mathbb{F}q$) that achieve minimum distance $\Omega(\sqrt{k})$ where $k$ is the message length. The coefficients of the Wozencraft ensemble codes are constructed using Sidon Sets and the cyclic structure of $\mathbb{F}{q^{k}}$ where $k+1$ is prime with $q$ a primitive root modulo $k+1$. Assuming Artin's conjecture, there are infinitely many such $k$ for any prime power $q$.
Directional Antenna Based Scheduling Protocol for IoT Networks
Authors: Anil Carie, Abdur Rashid Sangi, Satish Anamalamudi, Murali Krishna Enduri, Baha Ihnaini, Hemn Barzan Abdalla, Mohammed Saeed Alkatheiri
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.02511
Pdf link: https://arxiv.org/pdf/2305.02511
Abstract Scheduling and Channel Access at the MAC layer of the IoT network plays a pivotal role in enhancing the performance of IoT networks. State-of-the-art Omni-directional antenna based application data transmission has relatively less achievable throughput in comparison with directional antenna based scheduling protocols. To enhance the performance of the IoT networks, this paper propose a distributed one-hop scheduling algorithm called Directional Scheduling protocol for constrained deterministic 6TiSCH-IoT network. With this, in-creased number of IoT nodes can have concurrent application data transmission with efficient spatial reuse. This in-turn results in higher number of cell allocation to the one-hop IoT nodes during data transmission. The proposed algorithm makes use of through directional transmissions avoids head of line blocking.
Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA
Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.02544
Pdf link: https://arxiv.org/pdf/2305.02544
Abstract We study principal component analysis (PCA), where given a dataset in $\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection
Authors: Nils Loose, Felix Mächtle, Claudius Pott, Volodymyr Bezsmertnyi, Thomas Eisenbarth
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.02559
Pdf link: https://arxiv.org/pdf/2305.02559
Abstract WebAssembly (Wasm) is a low-level binary format for web applications, which has found widespread adoption due to its improved performance and compatibility with existing software. However, the popularity of Wasm has also led to its exploitation for malicious purposes, such as cryptojacking, where malicious actors use a victim's computing resources to mine cryptocurrencies without their consent. To counteract this threat, machine learning-based detection methods aiming to identify cryptojacking activities within Wasm code have emerged. It is well-known that neural networks are susceptible to adversarial attacks, where inputs to a classifier are perturbed with minimal changes that result in a crass misclassification. While applying changes in image classification is easy, manipulating binaries in an automated fashion to evade malware classification without changing functionality is non-trivial. In this work, we propose a new approach to include adversarial examples in the code section of binaries via instrumentation. The introduced gadgets allow for the inclusion of arbitrary bytes, enabling efficient adversarial attacks that reliably bypass state-of-the-art machine learning classifiers such as the CNN-based Minos recently proposed at NDSS 2021. We analyze the cost and reliability of instrumentation-based adversarial example generation and show that the approach works reliably at minimal size and performance overheads.
Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts
Authors: Ruoyu Feng, Jinming Liu, Xin Jin, Xiaohan Pan, Heming Sun, Zhibo Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02578
Pdf link: https://arxiv.org/pdf/2305.02578
Abstract Image coding for machines (ICM) aims to compress images to support downstream AI analysis instead of human perception. For ICM, developing a unified codec to reduce information redundancy while empowering the compressed features to support various vision tasks is very important, which inevitably faces two core challenges: 1) How should the compression strategy be adjusted based on the downstream tasks? 2) How to well adapt the compressed features to different downstream tasks? Inspired by recent advances in transferring large-scale pre-trained models to downstream tasks via prompting, in this work, we explore a new ICM framework, termed Prompt-ICM. To address both challenges by carefully learning task-driven prompts to coordinate well the compression process and downstream analysis. Specifically, our method is composed of two core designs: a) compression prompts, which are implemented as importance maps predicted by an information selector, and used to achieve different content-weighted bit allocations during compression according to different downstream tasks; b) task-adaptive prompts, which are instantiated as a few learnable parameters specifically for tuning compressed features for the specific intelligent task. Extensive experiments demonstrate that with a single feature codec and a few extra parameters, our proposed framework could efficiently support different kinds of intelligent tasks with much higher coding efficiency.
Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model
Authors: Chao Xu, Shaoting Zhu, Junwei Zhu, Tianxin Huang, Jiangning Zhang, Ying Tai, Yong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02594
Pdf link: https://arxiv.org/pdf/2305.02594
Abstract Multimodal-driven talking face generation refers to animating a portrait with the given pose, expression, and gaze transferred from the driving image and video, or estimated from the text and audio. However, existing methods ignore the potential of text modal, and their generators mainly follow the source-oriented feature rearrange paradigm coupled with unstable GAN frameworks. In this work, we first represent the emotion in the text prompt, which could inherit rich semantics from the CLIP, allowing flexible and generalized emotion control. We further reorganize these tasks as the target-oriented texture transfer and adopt the Diffusion Models. More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation. Additionally, TGDM can be gracefully tailored for face swapping. We derive a novel paradigm free of unstable seesaw-style optimization, resulting in simple, stable, and effective training and inference schemes. Extensive experiments demonstrate the superiority of our method.
IMAP: Intrinsically Motivated Adversarial Policy
Authors: Xiang Zheng, Xingjun Ma, Shengjie Wang, Xinyu Wang, Chao Shen, Cong Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02605
Pdf link: https://arxiv.org/pdf/2305.02605
Abstract Reinforcement learning (RL) agents are known to be vulnerable to evasion attacks during deployment. In single-agent environments, attackers can inject imperceptible perturbations on the policy or value network's inputs or outputs; in multi-agent environments, attackers can control an adversarial opponent to indirectly influence the victim's observation. Adversarial policies offer a promising solution to craft such attacks. Still, current approaches either require perfect or partial knowledge of the victim policy or suffer from sample inefficiency due to the sparsity of task-related rewards. To overcome these limitations, we propose the Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box evasion attacks in single- and multi-agent environments without any knowledge of the victim policy. IMAP uses four intrinsic objectives based on state coverage, policy coverage, risk, and policy divergence to encourage exploration and discover stronger attacking skills. We also design a novel Bias-Reduction (BR) method to boost IMAP further. Our experiments demonstrate the effectiveness of these intrinsic objectives and BR in improving adversarial policy learning in the black-box setting against multiple types of victim agents in various single- and multi-agent MuJoCo environments. Notably, our IMAP reduces the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% and achieves a SOTA attacking success rate of 83.91\% in the two-player zero-sum game YouShallNotPass.
Re$^3$Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training
Authors: Jiaxin Wen, Hao Zhou, Minlie Huang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.02606
Pdf link: https://arxiv.org/pdf/2305.02606
Abstract Large-scale open-domain dialogue data crawled from public social media has greatly improved the performance of dialogue models. However, long-turn dialogues are still highly scarce. Specifically, most dialogue sessions in existing corpora have less than three turns. To alleviate this issue, we propose the Retrieve, Reorganize and Rescale framework (Re$^3$Dial), which can automatically construct a billion-scale long-turn dialogue corpus from existing short-turn dialogue data. Re$^3$Dial first trains an Unsupervised Dense Session Retriever (UDSR) to capture semantic and discourse relationships within multi-turn dialogues for retrieving relevant and coherent sessions. It then reorganizes the short-turn dialogues into long-turn sessions via recursively retrieving and selecting the consecutive sessions with our proposed diversity sampling strategy. Extensive evaluations on multiple multi-turn dialogue benchmarks demonstrate that Re$^3$Dial consistently and significantly improves the dialogue model's ability to utilize long-term context for modeling multi-turn dialogues across different pre-training settings. Finally, we build a toolkit for efficiently rescaling dialogue corpus with Re$^3$Dial, which enables us to construct a corpus containing 1B Chinese dialogue sessions with 11.3 turns on average (5X longer than the original EVA corpus). We will release our UDSR model, toolkit, and data for public use.
Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
Authors: Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02610
Pdf link: https://arxiv.org/pdf/2305.02610
Abstract Image retrieval plays an important role in the Internet world. Usually, the core parts of mainstream visual retrieval systems include an online service of the embedding model and a large-scale vector database. For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data. Recently, backward-compatible training (BCT) enables the new model to be immediately deployed online by making the new embeddings directly comparable to the old ones. For BCT, improving the compatibility of two models with less negative impact on retrieval performance is the key challenge. In this paper, we introduce AdvBCT, an Adversarial Backward-Compatible Training method with an elastic boundary constraint that takes both compatibility and discrimination into consideration. We first employ adversarial learning to minimize the distribution disparity between embeddings of the new model and the old model. Meanwhile, we add an elastic boundary constraint during training to improve compatibility and discrimination efficiently. Extensive experiments on GLDv2, Revisited Oxford (ROxford), and Revisited Paris (RParis) demonstrate that our method outperforms other BCT methods on both compatibility and discrimination. The implementation of AdvBCT will be publicly available at https://github.com/Ashespt/AdvBCT.
Real-Time Spatial Trajectory Planning for Urban Environments Using Dynamic Optimization
Authors: Jona Ruof, Max Bastian Mertens, Michael Buchholz, Klaus Dietmayer
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.02621
Pdf link: https://arxiv.org/pdf/2305.02621
Abstract Planning trajectories for automated vehicles in urban environments requires methods with high generality, long planning horizons, and fast update rates. Using a path-velocity decomposition, we contribute a novel planning framework, which generates foresighted trajectories and can handle a wide variety of state and control constraints effectively. In contrast to related work, the proposed optimal control problems are formulated over space rather than time. This spatial formulation decouples environmental constraints from the optimization variables, which allows the application of simple, yet efficient shooting methods. To this end, we present a tailored solution strategy based on ILQR, in the Augmented Lagrangian framework, to rapidly minimize the trajectory objective costs, even under infeasible initial solutions. Evaluations in simulation and on a full-sized automated vehicle in real-world urban traffic show the real-time capability and versatility of the proposed approach.
Variations on a Theme by Blahut and Arimoto
Authors: Lingyi Chen, Shitong Wu, Wenhao Ye, Huihui Wu, Wenyi Zhang, Hao Wu, Bo Bai
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.02650
Pdf link: https://arxiv.org/pdf/2305.02650
Abstract The Blahut-Arimoto (BA) algorithm has played a fundamental role in the numerical computation of rate-distortion (RD) functions. This algorithm possesses a desirable monotonic convergence property by alternatively minimizing its Lagrangian with a fixed multiplier. In this paper, we propose a novel modification of the BA algorithm, letting the multiplier be updated in each iteration via a one-dimensional root-finding step with respect to a monotonic univariate function, which can be efficiently implemented by Newton's method. This allows the multiplier to be updated in a flexible and efficient manner, overcoming a major drawback of the original BA algorithm wherein the multiplier is fixed throughout iterations. Consequently, the modified algorithm is capable of directly computing the RD function for a given target distortion, without exploring the entire RD curve as in the original BA algorithm. A theoretical analysis shows that the modified algorithm still converges to the RD function and the convergence rate is $\Theta(1/n)$, where $n$ denotes the number of iterations. Numerical experiments demonstrate that the modified algorithm directly computes the RD function with a given target distortion, and it significantly accelerates the original BA algorithm.
LatentAugment: Dynamically Optimized Latent Probabilities of Data Augmentation
Authors: Koichi Kuriyama
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02668
Pdf link: https://arxiv.org/pdf/2305.02668
Abstract Although data augmentation is a powerful technique for improving the performance of image classification tasks, it is difficult to identify the best augmentation policy. The optimal augmentation policy, which is the latent variable, cannot be directly observed. To address this problem, this study proposes $\textit{LatentAugment}$, which estimates the latent probability of optimal augmentation. The proposed method is appealing in that it can dynamically optimize the augmentation strategies for each input and model parameter in learning iterations. Theoretical analysis shows that LatentAugment is a general model that includes other augmentation methods as special cases, and it is simple and computationally efficient in comparison with existing augmentation methods. Experimental results show that the proposed LatentAugment has higher test accuracy than previous augmentation methods on the CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.
Real-Time Neural Appearance Models
Authors: Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.02678
Pdf link: https://arxiv.org/pdf/2305.02678
Abstract We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior -- transformation of directions into learned shading frames -- facilitates accurate reconstruction of mesoscale effects. The second prior -- a microfacet sampling distribution -- allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.
Mixed Max-and-Min Fractional Programming for Wireless Networks
Authors: Yannan Chen, Licheng Zhao, Kaiming Shen
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2305.02704
Pdf link: https://arxiv.org/pdf/2305.02704
Abstract Fractional programming (FP) plays a crucial role in wireless network design because many relevant problems involve maximizing or minimizing ratio terms. Notice that the maximization case and the minimization case of FP cannot be converted to each other in general, so they have to be dealt with separately in most of the previous studies. Thus, an existing FP method for maximizing ratios typically does not work for the minimization case, and vice versa. However, the FP objective can be mixed max-and-min, e.g., one may wish to maximize the signal-to-interference-plus-noise ratio (SINR) of the legitimate receiver while minimizing that of the eavesdropper. We aim to fill the gap between max-FP and min-FP by devising a unified optimization framework. The main results are three-fold. First, we extend the existing max-FP technique called quadratic transform to the min-FP, and further develop a full generalization for the mixed case. Second. we provide a minorization-maximization (MM) interpretation of the proposed unified approach, thereby establishing its convergence and also obtaining a matrix extension; another result we obtain is a generalized Lagrangian dual transform which facilitates the solving of the logarithmic FP. Finally, we present three typical applications: the age-of-information (AoI) minimization, the Cramer-Rao bound minimization for sensing, and the secure data rate maximization, none of which can be efficiently addressed by the previous FP methods.
Guidance & Control Networks for Time-Optimal Quadcopter Flight
Authors: Sebastien Origer, Christophe De Wagter, Robin Ferede, Guido C.H.E. de Croon, Dario Izzo
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.02705
Pdf link: https://arxiv.org/pdf/2305.02705
Abstract Reaching fast and autonomous flight requires computationally efficient and robust algorithms. To this end, we train Guidance & Control Networks to approximate optimal control policies ranging from energy-optimal to time-optimal flight. We show that the policies become more difficult to learn the closer we get to the time-optimal 'bang-bang' control profile. We also assess the importance of knowing the maximum angular rotor velocity of the quadcopter and show that over- or underestimating this limit leads to less robust flight. We propose an algorithm to identify the current maximum angular rotor velocity onboard and a network that adapts its policy based on the identified limit. Finally, we extend previous work on Guidance & Control Networks by learning to take consecutive waypoints into account. We fly a 4x3m track in similar lap times as the differential-flatness-based minimum snap benchmark controller while benefiting from the flexibility that Guidance & Control Networks offer.
Uncertainty Aware Deep Learning Model for Secure and Trustworthy Channel Estimation in 5G Networks
Authors: Ferhat Ozgur Catak, Umit Cali, Murat Kuzlu, Salih Sarp
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.02741
Pdf link: https://arxiv.org/pdf/2305.02741
Abstract With the rise of intelligent applications, such as self-driving cars and augmented reality, the security and reliability of wireless communication systems have become increasingly crucial. One of the most critical components of ensuring a high-quality experience is channel estimation, which is fundamental for efficient transmission and interference management in wireless networks. However, using deep neural networks (DNNs) in channel estimation raises security and trust concerns due to their complexity and the need for more transparency in decision-making. This paper proposes a Monte Carlo Dropout (MCDO)-based approach for secure and trustworthy channel estimation in 5G networks. Our approach combines the advantages of traditional and deep learning techniques by incorporating conventional pilot-based channel estimation as a prior in the deep learning model. Additionally, we use MCDO to obtain uncertainty-aware predictions, enhancing the model's security and trustworthiness. Our experiments demonstrate that our proposed approach outperforms traditional and deep learning-based approaches regarding security, trustworthiness, and performance in 5G scenarios.
Efficient Personalized Federated Learning via Sparse Model-Adaptation
Authors: Daoyuan Chen, Liuyi Yao, Dawei Gao, Bolin Ding, Yaliang Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02776
Pdf link: https://arxiv.org/pdf/2305.02776
Abstract Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients' local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients' resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.
ItoV: Efficiently Adapting Deep Learning-based Image Watermarking to Video Watermarking
Authors: Guanhui Ye, Jiashi Gao, Yuchen Wang, Liyan Song, Xuetao Wei
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2305.02781
Pdf link: https://arxiv.org/pdf/2305.02781
Abstract Robust watermarking tries to conceal information within a cover image/video imperceptibly that is resistant to various distortions. Recently, deep learning-based approaches for image watermarking have made significant advancements in robustness and invisibility. However, few studies focused on video watermarking using deep neural networks due to the high complexity and computational costs. Our paper aims to answer this research question: Can well-designed deep learning-based image watermarking be efficiently adapted to video watermarking? Our answer is positive. First, we revisit the workflow of deep learning-based watermarking methods that leads to a critical insight: temporal information in the video may be essential for general computer vision tasks but not for specific video watermarking. Inspired by this insight, we propose a method named ItoV for efficiently adapting deep learning-based Image watermarking to Video watermarking. Specifically, ItoV merges the temporal dimension of the video with the channel dimension to enable deep neural networks to treat videos as images. We further explore the effects of different convolutional blocks in video watermarking. We find that spatial convolution is the primary influential component in video watermarking and depthwise convolutions significantly reduce computational cost with negligible impact on performance. In addition, we propose a new frame loss to constrain that the watermark intensity in each video clip frame is consistent, significantly improving the invisibility. Extensive experiments show the superior performance of the adapted video watermarking method compared with the state-of-the-art methods on Kinetics-600 and Inter4K datasets, which demonstrate the efficacy of our method ItoV.
A Momentum-Incorporated Non-Negative Latent Factorization of Tensors Model for Dynamic Network Representation
Authors: Aoling Zeng
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2305.02782
Pdf link: https://arxiv.org/pdf/2305.02782
Abstract A large-scale dynamic network (LDN) is a source of data in many big data-related applications due to their large number of entities and large-scale dynamic interactions. They can be modeled as a high-dimensional incomplete (HDI) tensor that contains a wealth of knowledge about time patterns. A Latent factorization of tensors (LFT) model efficiently extracts this time pattern, which can be established using stochastic gradient descent (SGD) solvers. However, LFT models based on SGD are often limited by training schemes and have poor tail convergence. To solve this problem, this paper proposes a novel nonlinear LFT model (MNNL) based on momentum-incorporated SGD, which extracts non-negative latent factors from HDI tensors to make training unconstrained and compatible with general training schemes, while improving convergence accuracy and speed. Empirical studies on two LDN datasets show that compared to existing models, the MNNL model has higher prediction accuracy and convergence speed.
A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators
Authors: Minwoo Lee, Kyu Tae Kim, Jongho Park
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.02801
Pdf link: https://arxiv.org/pdf/2305.02801
Abstract Self-sustained oscillations are ubiquitous in nature and engineering. In this paper, we propose a novel output-only system-identification framework for identifying the system parameters of a self-sustained oscillator affected by Gaussian white noise. A Langevin model that characterizes the self-sustained oscillator is postulated, and the corresponding Fokker--Planck equation is derived from stochastic averaging. From the drift and diffusion terms of the Fokker--Planck equation, unknown parameters of the system are identified. We develop a numerically efficient algorithm for enhancing the accuracy of parameter identification. In particular, a modified Levenberg--Marquardt optimization algorithm tailored to output-only system identification is introduced. The proposed framework is demonstrated on both numerical and experimental oscillators with varying system parameters that develop into self-sustained oscillations. The results show that the computational cost required for performing the system identification is dramatically reduced by using the proposed framework. Also, system parameters that were difficult to be extracted with the existing method could be efficiently computed with the system identification method developed in this study. Pertaining to the robustness and computational efficiency of the presented framework, this study can contribute to an accurate and fast diagnosis of dynamical systems under stochastic forcing.
Dual-Quaternion Fourier Transform
Authors: Benjamin Kenwright
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.02802
Pdf link: https://arxiv.org/pdf/2305.02802
Abstract Fourier transform (FT) plays a crucial role in a broad range of applications, from enhancement, restoration and analysis through to security, compression and manipulation. The Fourier transform (FT) is a process that converts a function into a form that describes the frequencies. This process has been extended to many domains and numerical representations (including quaternions). However, in this article, we present a new approach using dual-quaternions. As dual-quaternions offer an efficient and compact symbolic form with unique mathematical properties. While dual-quaternions have established themselves in many fields of science and computing as an efficient mathematical model for providing an unambiguous, un-cumbersome, computationally effective means of representing multi-component data, not much research has been done to combine them with Fourier processes. Dual-quaternions are simply the unification of dual-number theory with hypercomplex numbers; a mathematical concept that allows multi-variable data sets to be transformed, combined, manipulated and interpolated in a unified non-linear manner. We define a Dual-Quaternion Fourier transform (DQFT) for dual-quaternion valued data over dual-quaternion domains. This opens the door to new types of analysis, manipulation and filtering techniques. We also present the Inverse Dual-Quaternion Fourier Transform (IDQFT). The DQFT unlocks the potential provided by hypercomplex algebra in higher dimensions useful for solving dual-quaternion partial differential equations or functional equations (e.g., for multicomponent data analysis)
An asymptotic preserving kinetic scheme for the M1 model of linear transport
Authors: Feugeas Jean-Luc, Mathiaud Julien, Mieussens Luc, Vigier Thomas
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2305.02804
Pdf link: https://arxiv.org/pdf/2305.02804
Abstract Moment models with suitable closure can lead to accurate and computationally efficient solvers for particle transport. Hence, we propose a new asymptotic preserving scheme for the M1 model of linear transport that works uniformly for any Knudsen number. Our idea is to apply the M1 closure at the numerical level to an existing asymptotic preserving scheme for the corresponding kinetic equation, namely the Unified Gas Kinetic scheme (UGKS) originally proposed in [27] and extended to linear transport in [24]. In order to ensure the moments realizability in this new scheme, the UGKS positivity needs to be maintained. We propose a new density reconstruction in time to obtain this property. A second order extension is also suggested and validated. Several test cases show the performances of this new scheme.
Local Optima Correlation Assisted Adaptive Operator Selection
Authors: Jiyuan Pei, Hao Tong, Jialin Liu, Yi Mei, Xin Yao
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2305.02805
Pdf link: https://arxiv.org/pdf/2305.02805
Abstract For solving combinatorial optimisation problems with metaheuristics, different search operators are applied for sampling new solutions in the neighbourhood of a given solution. It is important to understand the relationship between operators for various purposes, e.g., adaptively deciding when to use which operator to find optimal solutions efficiently. However, it is difficult to theoretically analyse this relationship, especially in the complex solution space of combinatorial optimisation problems. In this paper, we propose to empirically analyse the relationship between operators in terms of the correlation between their local optima and develop a measure for quantifying their relationship. The comprehensive analyses on a wide range of capacitated vehicle routing problem benchmark instances show that there is a consistent pattern in the correlation between commonly used operators. Based on this newly proposed local optima correlation metric, we propose a novel approach for adaptively selecting among the operators during the search process. The core intention is to improve search efficiency by preventing wasting computational resources on exploring neighbourhoods where the local optima have already been reached. Experiments on randomly generated instances and commonly used benchmark datasets are conducted. Results show that the proposed approach outperforms commonly used adaptive operator selection methods.
Interpretable Sentence Representation with Variational Autoencoders and Attention
Authors: Ghazi Felhi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02810
Pdf link: https://arxiv.org/pdf/2305.02810
Abstract In this thesis, we develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP) while accounting for the unavailability of annotated data. We choose to leverage Variational Autoencoders (VAEs) due to their efficiency in relating observations to latent generative factors and their effectiveness in data-efficient learning and interpretable representation learning. As a first contribution, we identify and remove unnecessary components in the functioning scheme of semi-supervised VAEs making them faster, smaller and easier to design. Our second and main contribution is to use VAEs and Transformers to build two models with inductive bias to separate information in latent representations into understandable concepts without annotated data. The first model, Attention-Driven VAE (ADVAE), is able to separately represent and control information about syntactic roles in sentences. The second model, QKVAE, uses separate latent variables to form keys and values for its Transformer decoder and is able to separate syntactic and semantic information in its neural representations. In transfer experiments, QKVAE has competitive performance compared to supervised models and equivalent performance to a supervised model using 50K annotated samples. Additionally, QKVAE displays improved syntactic role disentanglement capabilities compared to ADVAE. Overall, we demonstrate that it is possible to enhance the interpretability of state-of-the-art deep learning architectures for language modeling with unannotated data in situations where text data is abundant but annotations are scarce.
Shannon meets Gray: Noise-robust, Low-sensitivity Codes with Applications in Differential Privacy
Authors: David Rasmussen Lolck, Rasmus Pagh
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.02816
Pdf link: https://arxiv.org/pdf/2305.02816
Abstract Integer data is typically made differentially private by adding noise from a Discrete Laplace (or Discrete Gaussian) distribution. We study the setting where differential privacy of a counting query is achieved using bit-wise randomized response, i.e., independent, random bit flips on the encoding of the query answer. Binary error-correcting codes transmitted through noisy channels with independent bit flips are well-studied in information theory. However, such codes are unsuitable for differential privacy since they have (by design) high sensitivity, i.e., neighboring integers have encodings with a large Hamming distance. Gray codes show that it is possible to create an efficient sensitivity 1 encoding, but are also not suitable for differential privacy due to lack of noise-robustness. Our main result is that it is possible, with a constant rate code, to simultaneously achieve the sensitivity of Gray codes and the noise-robustness of error-correcting codes (down to the noise level required for differential privacy). An application of this new encoding of the integers is a faster, space-optimal differentially private data structure for histograms.
MEDIC: A Multimodal Empathy Dataset in Counseling
Authors: Zhou'an_Zhu, Xin Li, Jicai Pan, Yufei Xiao, Yanan Chang, Feiyi Zheng, Shangfei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2305.02842
Pdf link: https://arxiv.org/pdf/2305.02842
Abstract Although empathic interaction between counselor and client is fundamental to success in the psychotherapeutic process, there are currently few datasets to aid a computational approach to empathy understanding. In this paper, we construct a multimodal empathy dataset collected from face-to-face psychological counseling sessions. The dataset consists of 771 video clips. We also propose three labels (i.e., expression of experience, emotional reaction, and cognitive reaction) to describe the degree of empathy between counselors and their clients. Expression of experience describes whether the client has expressed experiences that can trigger empathy, and emotional and cognitive reactions indicate the counselor's empathic reactions. As an elementary assessment of the usability of the constructed multimodal empathy dataset, an interrater reliability analysis of annotators' subjective evaluations for video clips is conducted using the intraclass correlation coefficient and Fleiss' Kappa. Results prove that our data annotation is reliable. Furthermore, we conduct empathy prediction using three typical methods, including the tensor fusion network, the sentimental words aware fusion network, and a simple concatenation model. The experimental results show that empathy can be well predicted on our dataset. Our dataset is available for research purposes.
Fundamental Detection Probability vs. Achievable Rate Tradeoff in Integrated Sensing and Communication Systems
Authors: Jiancheng An, Hongbin Li, Derrick Wing Kwan Ng, Chau Yuen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.02847
Pdf link: https://arxiv.org/pdf/2305.02847
Abstract Integrating sensing functionalities is envisioned as a distinguishing feature of next-generation mobile networks, which has given rise to the development of a novel enabling technology -- \emph{Integrated Sensing and Communication (ISAC)}. Portraying the theoretical performance bounds of ISAC systems is fundamentally important to understand how sensing and communication functionalities interact (e.g., competitively or cooperatively) in terms of resource utilization, while revealing insights and guidelines for the development of effective physical-layer techniques. In this paper, we characterize the fundamental performance tradeoff between the detection probability for target monitoring and the user's achievable rate in ISAC systems. To this end, we first discuss the achievable rate of the user under sensing-free and sensing-interfered communication scenarios. Furthermore, we derive closed-form expressions for the probability of false alarm (PFA) and the successful probability of detection (PD) for monitoring the target of interest, where we consider both communication-assisted and communication-interfered sensing scenarios. In addition, the effects of the unknown channel coefficient are also taken into account in our theoretical analysis. Based on our analytical results, we then carry out a comprehensive assessment of the performance tradeoff between sensing and communication functionalities. Specifically, we formulate a power allocation problem to minimize the transmit power at the base station (BS) under the constraints of ensuring a required PD for perception as well as the communication user's quality of service requirement in terms of achievable rate. Finally, simulation results corroborate the accuracy of our theoretical analysis and the effectiveness of the proposed power allocation solutions.
Hierarchical Transformer for Scalable Graph Learning
Authors: Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2305.02866
Pdf link: https://arxiv.org/pdf/2305.02866
Abstract Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning. However, as current implementations of Graph Transformer primarily focus on learning representations of small-scale graphs, the quadratic complexity of the global self-attention mechanism presents a challenge for full-batch training when applied to larger graphs. Additionally, conventional sampling-based methods fail to capture necessary high-level contextual information, resulting in a significant loss of performance. In this paper, we introduce the Hierarchical Scalable Graph Transformer (HSGT) as a solution to these challenges. HSGT successfully scales the Transformer architecture to node representation learning tasks on large-scale graphs, while maintaining high performance. By utilizing graph hierarchies constructed through coarsening techniques, HSGT efficiently updates and stores multi-scale information in node embeddings at different levels. Together with sampling-based training methods, HSGT effectively captures and aggregates multi-level information on the hierarchical graph using only Transformer blocks. Empirical evaluations demonstrate that HSGT achieves state-of-the-art performance on large-scale benchmarks with graphs containing millions of nodes with high efficiency.
Input Layer Binarization with Bit-Plane Encoding
Authors: Lorenzo Vorabbi, Davide Maltoni, Stefano Santi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02885
Pdf link: https://arxiv.org/pdf/2305.02885
Abstract Binary Neural Networks (BNNs) use 1-bit weights and activations to efficiently execute deep convolutional neural networks on edge devices. Nevertheless, the binarization of the first layer is conventionally excluded, as it leads to a large accuracy loss. The few works addressing the first layer binarization, typically increase the number of input channels to enhance data representation; such data expansion raises the amount of operations needed and it is feasible only on systems with enough computational resources. In this work, we present a new method to binarize the first layer using directly the 8-bit representation of input data; we exploit the standard bit-planes encoding to extract features bit-wise (using depth-wise convolutions); after a re-weighting stage, features are fused again. The resulting model is fully binarized and our first layer binarization approach is model independent. The concept is evaluated on three classification datasets (CIFAR10, SVHN and CIFAR100) for different model architectures (VGG and ResNet) and, the proposed technique outperforms state of the art methods both in accuracy and BMACs reduction.
UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery
Authors: Chuanbo Hu, Shan Jia, Fan Zhang, Changjiang Xiao, Mindi Ruan, Jacob Thrasher, Xin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02911
Pdf link: https://arxiv.org/pdf/2305.02911
Abstract Urban Physical Disorder (UPD), such as old or abandoned buildings, broken sidewalks, litter, and graffiti, has a negative impact on residents' quality of life. They can also increase crime rates, cause social disorder, and pose a public health risk. Currently, there is a lack of efficient and reliable methods for detecting and understanding UPD. To bridge this gap, we propose UPDExplainer, an interpretable transformer-based framework for UPD detection. We first develop a UPD detection model based on the Swin Transformer architecture, which leverages readily accessible street view images to learn discriminative representations. In order to provide clear and comprehensible evidence and analysis, we subsequently introduce a UPD factor identification and ranking module that combines visual explanation maps with semantic segmentation maps. This novel integrated approach enables us to identify the exact objects within street view images that are responsible for physical disorders and gain insights into the underlying causes. Experimental results on the re-annotated Place Pulse 2.0 dataset demonstrate promising detection performance of the proposed method, with an accuracy of 79.9%. For a comprehensive evaluation of the method's ranking performance, we report the mean Average Precision (mAP), R-Precision (RPrec), and Normalized Discounted Cumulative Gain (NDCG), with success rates of 75.51%, 80.61%, and 82.58%, respectively. We also present a case study of detecting and ranking physical disorders in the southern region of downtown Los Angeles, California, to demonstrate the practicality and effectiveness of our framework.
Flow Correlator: A Flow Table Cache Management Strategy
Authors: Luke McHale, Paul V Gratz, Alex Sprintson
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2305.02918
Pdf link: https://arxiv.org/pdf/2305.02918
Abstract Switching, routing, and security functions are the backbone of packet processing networks. Fast and efficient processing of packets requires maintaining the state of a large number of transient network connections. In particular, modern stateful firewalls, security monitoring devices, and software-defined networking (SDN) programmable dataplanes require maintaining stateful flow tables. These flow tables often grow much larger than can be expected to fit within on-chip memory, requiring a managed caching layer to maintain performance. This paper focuses on improving the efficiency of caching, an important architectural component of the packet processing data planes. We present a novel predictive approach to network flow table cache management. Our approach leverages a Hashed Perceptron binary classifier as well as an iterative approach to feature selection and ranking to improve the reliability and performance of the data plane caches. We validate the efficiency of the proposed techniques through extensive experimentation using real-world data sets. Our numerical results demonstrate that our techniques improve the reliability and performance of flow-centric packet processing architectures.
Coloring tournaments with few colors: Algorithms and complexity
Authors: Felix Klingelhoefer, Alantha Newman
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.02922
Pdf link: https://arxiv.org/pdf/2305.02922
Abstract A k-coloring of a tournament is a partition of its vertices into k acyclic sets. Deciding if a tournament is 2-colorable is NP-hard. A natural problem, akin to that of coloring a 3-colorable graph with few colors, is to color a 2-colorable tournament with few colors. This problem does not seem to have been addressed before, although it is a special case of coloring a 2-colorable 3-uniform hypergraph with few colors, which is a well-studied problem with super-constant lower bounds. We present an efficient decomposition lemma for tournaments and show that it can be used to design polynomial-time algorithms to color various classes of tournaments with few colors, including an algorithm to color a 2-colorable tournament with ten colors. For the classes of tournaments considered, we complement our upper bounds with strengthened lower bounds, painting a comprehensive picture of the algorithmic and complexity aspects of coloring tournaments.
Rethinking Population-assisted Off-policy Reinforcement Learning
Authors: Bowen Zheng, Ran Cheng
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2305.02949
Pdf link: https://arxiv.org/pdf/2305.02949
Abstract While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.
Majorizing Measures, Codes, and Information
Authors: Yifeng Chu, Maxim Raginsky
Subjects: Information Theory (cs.IT); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.02960
Pdf link: https://arxiv.org/pdf/2305.02960
Abstract The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes. It relates the boundedness of random processes indexed by elements of a metric space to complexity measures arising from certain multiscale combinatorial structures, such as packing and covering trees. This paper builds on the ideas first outlined in a little-noticed preprint of Andreas Maurer to present an information-theoretic perspective on the majorizing measure theorem, according to which the boundedness of random processes is phrased in terms of the existence of efficient variable-length codes for the elements of the indexing metric space.
FUSegNet: A Deep Convolutional Neural Network for Foot Ulcer Segmentation
Authors: Mrinal Kanti Dhar, Taiyu Zhang, Yash Patel, Zeyun Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02961
Pdf link: https://arxiv.org/pdf/2305.02961
Abstract This paper presents FUSegNet, a new model for foot ulcer segmentation in diabetes patients, which uses the pre-trained EfficientNet-b7 as a backbone to address the issue of limited training samples. A modified spatial and channel squeeze-and-excitation (scSE) module called parallel scSE or P-scSE is proposed that combines additive and max-out scSE. A new arrangement is introduced for the module by fusing it in the middle of each decoder stage. As the top decoder stage carries a limited number of feature maps, max-out scSE is bypassed there to form a shorted P-scSE. A set of augmentations, comprising geometric, morphological, and intensity-based augmentations, is applied before feeding the data into the network. The proposed model is first evaluated on a publicly available chronic wound dataset where it achieves a data-based dice score of 92.70%, which is the highest score among the reported approaches. The model outperforms other scSE-based UNet models in terms of Pratt's figure of merits (PFOM) scores in most categories, which evaluates the accuracy of edge localization. The model is then tested in the MICCAI 2021 FUSeg challenge, where a variation of FUSegNet called x-FUSegNet is submitted. The x-FUSegNet model, which takes the average of outputs obtained by FUSegNet using 5-fold cross-validation, achieves a dice score of 89.23%, placing it at the top of the FUSeg Challenge leaderboard. The source code for the model is available on https://github.com/mrinal054/FUSegNet.
Adaptive Selection of Anchor Items for CUR-based k-NN search with Cross-Encoders
Authors: Nishant Yadav, Nicholas Monath, Manzil Zaheer, Andrew McCallum
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02996
Pdf link: https://arxiv.org/pdf/2305.02996
Abstract Cross-encoder models, which jointly encode and score a query-item pair, are typically prohibitively expensive for k-nearest neighbor search. Consequently, k-NN search is performed not with a cross-encoder, but with a heuristic retrieve (e.g., using BM25 or dual-encoder) and re-rank approach. Recent work proposes ANNCUR (Yadav et al., 2022) which uses CUR matrix factorization to produce an embedding space for efficient vector-based search that directly approximates the cross-encoder without the need for dual-encoders. ANNCUR defines this shared query-item embedding space by scoring the test query against anchor items which are sampled uniformly at random. While this minimizes average approximation error over all items, unsuitably high approximation error on top-k items remains and leads to poor recall of top-k (and especially top-1) items. Increasing the number of anchor items is a straightforward way of improving the approximation error and hence k-NN recall of ANNCUR but at the cost of increased inference latency. In this paper, we propose a new method for adaptively choosing anchor items that minimizes the approximation error for the practically important top-k neighbors for a query with minimal computational overhead. Our proposed method incrementally selects a suitable set of anchor items for a given test query over several rounds, using anchors chosen in previous rounds to inform selection of more anchor items. Empirically, our method consistently improves k-NN recall as compared to both ANNCUR and the widely-used dual-encoder-based retrieve-and-rerank approach.
FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics
Authors: Kai Klede, Leo Schwinn, Dario Zanca, Björn Eskofier
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.03022
Pdf link: https://arxiv.org/pdf/2305.03022
Abstract Clustering is at the very core of machine learning, and its applications proliferate with the increasing availability of data. However, as datasets grow, comparing clusterings with an adjustment for chance becomes computationally difficult, preventing unbiased ground-truth comparisons and solution selection. We propose FastAMI, a Monte Carlo-based method to efficiently approximate the Adjusted Mutual Information (AMI) and extend it to the Standardized Mutual Information (SMI). The approach is compared with the exact calculation and a recently developed variant of the AMI based on pairwise permutations, using both synthetic and real data. In contrast to the exact calculation our method is fast enough to enable these adjusted information-theoretic comparisons for large datasets while maintaining considerably more accurate results than the pairwise approach.
Decentralized and Compositional Interconnection Topology Synthesis for Linear Networked Systems
Authors: Shirantha Welikala, Hai Lin, Panos J. Antsaklis
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.03030
Pdf link: https://arxiv.org/pdf/2305.03030
Abstract In this paper, we consider networked systems comprised of interconnected sets of linear subsystems and propose a decentralized and compositional approach to stabilize or dissipativate such linear networked systems via optimally modifying some existing interconnections and/or creating entirely new interconnections. We also extend this interconnection topology synthesis approach to ensure the ability to stabilize or dissipativate such linear networked systems under distributed (local) feedback control. To the best of the authors' knowledge, this is the first work that attempts to address the optimal interconnection topology synthesis problem for linear networked systems. The proposed approach in this paper only involves solving a sequence of linear matrix inequality problems (one at each subsystem). Thus, using standard convex optimization toolboxes, it can be implemented efficiently and scalably in a decentralized and compositional manner. Apart from many generic linear networked systems applications (e.g., power grid control), a unique application for the proposed interconnection topology synthesis approach is in generating random stable (or dissipative, stabilizable, dissipativate-able) linear networked systems for simulation purposes. We also include an interesting case study where the proposed interconnection topology synthesis approach is compared with an alternative approach that only uses dissipativity information of the involved subsystems.
TUVF: Learning Generalizable Texture UV Radiance Fields
Authors: An-Chieh Cheng, Xueting Li, Sifei Liu, Xiaolong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.03040
Pdf link: https://arxiv.org/pdf/2305.03040
Abstract Textures are a vital aspect of creating visually appealing and realistic 3D models. In this paper, we study the problem of generating high-fidelity texture given shapes of 3D assets, which has been relatively less explored compared with generic 3D shape modeling. Our goal is to facilitate a controllable texture generation process, such that one texture code can correspond to a particular appearance style independent of any input shapes from a category. We introduce Texture UV Radiance Fields (TUVF) that generate textures in a learnable UV sphere space rather than directly on the 3D shape. This allows the texture to be disentangled from the underlying shape and transferable to other shapes that share the same UV space, i.e., from the same category. We integrate the UV sphere space with the radiance field, which provides a more efficient and accurate representation of textures than traditional texture maps. We perform our experiments on real-world object datasets where we achieve not only realistic synthesis but also substantial improvements over state-of-the-arts on texture controlling and editing. Project Page: https://www.anjiecheng.me/TUVF
OctFormer: Octree-based Transformers for 3D Point Clouds
Authors: Peng-Shuai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.03045
Pdf link: https://arxiv.org/pdf/2305.03045
Abstract OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D segmentation and detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at https://wang-ps.github.io/octformer.
Personalize Segment Anything Model with One Shot
Authors: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2305.03048
Pdf link: https://arxiv.org/pdf/2305.03048
Abstract Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the entire SAM, we introduce two learnable weights for multi-scale masks, only training 2 parameters within 10 seconds for improved performance. To demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for personalized evaluation, and test our methods on video object segmentation with competitive performance. Besides, our approach can also enhance DreamBooth to personalize Stable Diffusion for text-to-image generation, which discards the background disturbance for better target appearance learning. Code is released at https://github.com/ZrrSkywalker/Personalize-SAM
Keyword: faster

Using Language Models on Low-end Hardware
Authors: Fabian Ziegner, Janos Borst, Andreas Niekler, Martin Potthast
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02350
Pdf link: https://arxiv.org/pdf/2305.02350
Abstract This paper evaluates the viability of using fixed language models for training text classification networks on low-end hardware. We combine language models with a CNN architecture and put together a comprehensive benchmark with 8 datasets covering single-label and multi-label classification of topic, sentiment, and genre. Our observations are distilled into a list of trade-offs, concluding that there are scenarios, where not fine-tuning a language model yields competitive effectiveness at faster training, requiring only a quarter of the memory compared to fine-tuning.
Approximating CKY with Transformers
Authors: Ghazal Khalighinejad, Ollie Liu, Sam Wiseman
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02386
Pdf link: https://arxiv.org/pdf/2305.02386
Abstract We investigate the ability of transformer models to approximate the CKY algorithm, using them to directly predict a parse and thus avoid the CKY algorithm's cubic dependence on sentence length. We find that on standard constituency parsing benchmarks this approach achieves competitive or better performance than comparable parsers that make use of CKY, while being faster. We also evaluate the viability of this approach for parsing under random PCFGs. Here we find that performance declines as the grammar becomes more ambiguous, suggesting that the transformer is not fully capturing the CKY computation. However, we also find that incorporating additional inductive bias is helpful, and we propose a novel approach that makes use of gradients with respect to chart representations in predicting the parse, in analogy with the CKY algorithm being the subgradient of a partition function variant with respect to the chart.
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs
Authors: Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2305.02444
Pdf link: https://arxiv.org/pdf/2305.02444
Abstract General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by fusing the memory-intensive operations into the GEMM assembly kernels. We design a cache-friendly scheme for parallel FT-GEMM. Experimental results on Intel Cascade Lake demonstrate that FT-GEMM offers high reliability and performance -- faster than Intel MKL, OpenBLAS, and BLIS by 3.50\%$\sim$ 22.14\% for both serial and parallel GEMM, even under hundreds of errors injected per minute.
Perfect Sampling for Hard Spheres from Strong Spatial Mixing
Authors: Konrad Anand, Andreas Göbel, Marcus Pappik, Will Perkins
Subjects: Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2305.02450
Pdf link: https://arxiv.org/pdf/2305.02450
Abstract We provide a perfect sampling algorithm for the hard-sphere model on subsets of $\mathbb{R}^d$ with expected running time linear in the volume under the assumption of strong spatial mixing. A large number of perfect and approximate sampling algorithms have been devised to sample from the hard-sphere model, and our perfect sampling algorithm is efficient for a range of parameters for which only efficient approximate samplers were previously known and is faster than these known approximate approaches. Our methods also extend to the more general setting of Gibbs point processes interacting via finite-range, repulsive potentials.
Shap-E: Generating Conditional 3D Implicit Functions
Authors: Heewoo Jun, Alex Nichol
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02463
Pdf link: https://arxiv.org/pdf/2305.02463
Abstract We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at https://github.com/openai/shap-e.
Breast Cancer Diagnosis Using Machine Learning Techniques
Authors: Juan Zuluaga-Gomez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.02482
Pdf link: https://arxiv.org/pdf/2305.02482
Abstract Breast cancer is one of the most threatening diseases in women's life; thus, the early and accurate diagnosis plays a key role in reducing the risk of death in a patient's life. Mammography stands as the reference technique for breast cancer screening; nevertheless, many countries still lack access to mammograms due to economic, social, and cultural issues. Latest advances in computational tools, infrared cameras and devices for bio-impedance quantification, have given a chance to emerge other reference techniques like thermography, infrared thermography, electrical impedance tomography and biomarkers found in blood tests, therefore being faster, reliable and cheaper than other methods. In the last two decades, the techniques mentioned above have been considered as parallel and extended approaches for breast cancer diagnosis, as well many authors concluded that false positives and false negatives rates are significantly reduced. Moreover, when a screening method works together with a computational technique, it generates a "computer-aided diagnosis" system. The present work aims to review the last breakthroughs about the three techniques mentioned earlier, suggested machine learning techniques to breast cancer diagnosis, thus, describing the benefits of some methods in relation with other ones, such as, logistic regression, decision trees, random forest, deep and convolutional neural networks. With this, we studied several hyperparameters optimization approaches with parzen tree optimizers to improve the performance of baseline models. An exploratory data analysis for each database and a benchmark of convolutional neural networks for the database of thermal images are presented. The benchmark process, reviews image classification techniques with convolutional neural networks, like, Resnet50, NasNetmobile, InceptionResnet and Xception.
SuperNeuro: A Fast and Scalable Simulator for Neuromorphic Computing
Authors: Prasanna Date, Chathika Gunaratne, Shruti Kulkarni, Robert Patton, Mark Coletti, Thomas Potok
Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2305.02510
Pdf link: https://arxiv.org/pdf/2305.02510
Abstract In many neuromorphic workflows, simulators play a vital role for important tasks such as training spiking neural networks (SNNs), running neuroscience simulations, and designing, implementing and testing neuromorphic algorithms. Currently available simulators are catered to either neuroscience workflows (such as NEST and Brian2) or deep learning workflows (such as BindsNET). While the neuroscience-based simulators are slow and not very scalable, the deep learning-based simulators do not support certain functionalities such as synaptic delay that are typical of neuromorphic workloads. In this paper, we address this gap in the literature and present SuperNeuro, which is a fast and scalable simulator for neuromorphic computing, capable of both homogeneous and heterogeneous simulations as well as GPU acceleration. We also present preliminary results comparing SuperNeuro to widely used neuromorphic simulators such as NEST, Brian2 and BindsNET in terms of computation times. We demonstrate that SuperNeuro can be approximately 10--300 times faster than some of the other simulators for small sparse networks. On large sparse and large dense networks, SuperNeuro can be approximately 2.2 and 3.4 times faster than the other simulators respectively.
Towards a Scalable Proof Engine: A Performant Prototype Rewriting Primitive for Coq
Authors: Jason Gross, Andres Erbsen, Jade Philipoom, Rajashree Agrawal, Adam Chlipala
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2305.02521
Pdf link: https://arxiv.org/pdf/2305.02521
Abstract We address the challenges of scaling verification efforts to match the increasing complexity and size of systems. We propose a research agenda aimed at building a performant proof engine by studying the asymptotic performance of proof engines and redesigning their building blocks. As a case study, we explore equational rewriting and introduce a novel prototype proof engine building block for rewriting in Coq, utilizing proof by reflection for enhanced performance. Our prototype implementation can significantly improve the development of verified compilers, as demonstrated in a case study with the Fiat Cryptography toolchain. The resulting extracted command-line compiler is about 1000$\times$ faster while featuring simpler compiler-specific proofs. This work lays some foundation for scaling verification efforts and contributes to the broader goal of developing a proof engine with good asymptotic performance, ultimately aimed at enabling the verification of larger and more complex systems.
Cuttlefish: Low-rank Model Training without All The Tuning
Authors: Hongyi Wang, Saurabh Agarwal, Pongsakorn U-chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02538
Pdf link: https://arxiv.org/pdf/2305.02538
Abstract Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challenge by introducing Cuttlefish, an automated low-rank training approach that eliminates the need for tuning factorization hyperparameters. Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank (i.e., an approximation of the true rank) of each layer stabilizes at a constant value. Cuttlefish switches from full-rank to low-rank training once the stable ranks of all layers have converged, setting the dimension of each factorization to its corresponding stable rank. Our results show that Cuttlefish generates models up to 5.6 times smaller than full-rank models, and attains up to a 1.2 times faster end-to-end training process while preserving comparable accuracy. Moreover, Cuttlefish outperforms state-of-the-art low-rank model training methods and other prominent baselines. The source code for our implementation can be found at: https://github.com/hwang595/Cuttlefish.
UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building Instance Segmentation
Authors: Guoqing Yang, Fuyou Xue, Qi Zhang, Ke Xie, Chi-Wing Fu, Hui Huang
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02627
Pdf link: https://arxiv.org/pdf/2305.02627
Abstract We present the UrbanBIS benchmark for large-scale 3D urban understanding, supporting practical urban-level semantic and building-level instance segmentation. UrbanBIS comprises six real urban scenes, with 2.5 billion points, covering a vast area of 10.78 square kilometers and 3,370 buildings, captured by 113,346 views of aerial photogrammetry. Particularly, UrbanBIS provides not only semantic-level annotations on a rich set of urban objects, including buildings, vehicles, vegetation, roads, and bridges, but also instance-level annotations on the buildings. Further, UrbanBIS is the first 3D dataset that introduces fine-grained building sub-categories, considering a wide variety of shapes for different building types. Besides, we propose B-Seg, a building instance segmentation method to establish UrbanBIS. B-Seg adopts an end-to-end framework with a simple yet effective strategy for handling large-scale point clouds. Compared with mainstream methods, B-Seg achieves better accuracy with faster inference speed on UrbanBIS. In addition to the carefully-annotated point clouds, UrbanBIS provides high-resolution aerial-acquisition photos and high-quality large-scale 3D reconstruction models, which shall facilitate a wide range of studies such as multi-view stereo, urban LOD generation, aerial path planning, autonomous navigation, road network extraction, and so on, thus serving as an important platform for many intelligent city applications.
Real-Time Neural Appearance Models
Authors: Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.02678
Pdf link: https://arxiv.org/pdf/2305.02678
Abstract We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior -- transformation of directions into learned shading frames -- facilitates accurate reconstruction of mesoscale effects. The second prior -- a microfacet sampling distribution -- allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.
Interpretable Sentence Representation with Variational Autoencoders and Attention
Authors: Ghazi Felhi
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02810
Pdf link: https://arxiv.org/pdf/2305.02810
Abstract In this thesis, we develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP) while accounting for the unavailability of annotated data. We choose to leverage Variational Autoencoders (VAEs) due to their efficiency in relating observations to latent generative factors and their effectiveness in data-efficient learning and interpretable representation learning. As a first contribution, we identify and remove unnecessary components in the functioning scheme of semi-supervised VAEs making them faster, smaller and easier to design. Our second and main contribution is to use VAEs and Transformers to build two models with inductive bias to separate information in latent representations into understandable concepts without annotated data. The first model, Attention-Driven VAE (ADVAE), is able to separately represent and control information about syntactic roles in sentences. The second model, QKVAE, uses separate latent variables to form keys and values for its Transformer decoder and is able to separate syntactic and semantic information in its neural representations. In transfer experiments, QKVAE has competitive performance compared to supervised models and equivalent performance to a supervised model using 50K annotated samples. Additionally, QKVAE displays improved syntactic role disentanglement capabilities compared to ADVAE. Overall, we demonstrate that it is possible to enhance the interpretability of state-of-the-art deep learning architectures for language modeling with unannotated data in situations where text data is abundant but annotations are scarce.
Shannon meets Gray: Noise-robust, Low-sensitivity Codes with Applications in Differential Privacy
Authors: David Rasmussen Lolck, Rasmus Pagh
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.02816
Pdf link: https://arxiv.org/pdf/2305.02816
Abstract Integer data is typically made differentially private by adding noise from a Discrete Laplace (or Discrete Gaussian) distribution. We study the setting where differential privacy of a counting query is achieved using bit-wise randomized response, i.e., independent, random bit flips on the encoding of the query answer. Binary error-correcting codes transmitted through noisy channels with independent bit flips are well-studied in information theory. However, such codes are unsuitable for differential privacy since they have (by design) high sensitivity, i.e., neighboring integers have encodings with a large Hamming distance. Gray codes show that it is possible to create an efficient sensitivity 1 encoding, but are also not suitable for differential privacy due to lack of noise-robustness. Our main result is that it is possible, with a constant rate code, to simultaneously achieve the sensitivity of Gray codes and the noise-robustness of error-correcting codes (down to the noise level required for differential privacy). An application of this new encoding of the integers is a faster, space-optimal differentially private data structure for histograms.
2x Faster Language Model Pre-training via Masked Structural Growth
Authors: Yiqun Yao, Zheng Zhang, Jing Li, Yequan Wang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.02869
Pdf link: https://arxiv.org/pdf/2305.02869
Abstract Acceleration of large language model pre-training is a critical issue in present NLP research. In this paper, we focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one. There are two main research problems related to progressive growth: growth schedule and growth operator. For growth schedule, existing work has explored multi-stage expansion of depth and feedforward layers. However, the impact of each dimension on the schedule's efficiency is still an open question. For growth operator, existing work relies on the initialization of new weights to inherit knowledge, and achieve only non-strict function preservation, limiting further optimization of training dynamics. To address these issues, we propose Masked Structural Growth (MSG), including growth schedules involving all possible dimensions and strictly function-preserving growth operators that is independent of the initialization of new weights. Experiments show that MSG is significantly faster than related work: we achieve a speed-up of 80% for Bert-base and 120% for Bert-large pre-training. Moreover, MSG is able to improve fine-tuning performances at the same time.
What Else Can Voronoi Diagrams Do For Diameter In Planar Graphs?
Authors: Amir Abboud, Shay Mozes, Oren Weimann
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.02946
Pdf link: https://arxiv.org/pdf/2305.02946
Abstract The Voronoi diagrams technique was introduced by Cabello to compute the diameter of planar graphs in subquadratic time. We present novel applications of this technique in static, fault-tolerant, and partially-dynamic undirected unweighted planar graphs, as well as some new limitations. 1. In the static case, we give $n^{3+o(1)}/D^2$ and $\tilde{O}(n\cdot D^2)$ time algorithms for computing the diameter of a planar graph $G$ with diameter $D$. These are faster than the state of the art $\tilde{O}(n^{5/3})$ when $D<n^{1/3}$ or $D>n^{2/3}$. 2. In the fault-tolerant setting, we give an $n^{7/3+o(1)}$ time algorithm for computing the diameter of $G\setminus {e}$ for every edge $e$ in $G$ (the replacement diameter problem). Compared to the naive $\tilde{O}(n^{8/3})$ time algorithm that runs the static algorithm for every edge. 3. In the incremental setting, where we wish to maintain the diameter while while adding edges, we present an algorithm with total running time $n^{7/3+o(1)}$. Compared to the naive $\tilde{O}(n^{8/3})$ time algorithm that runs the static algorithm after every update. 4. We give a lower bound (conditioned on the SETH) ruling out an amortized $O(n^{1-\varepsilon})$ update time for maintaining the diameter in {\em weighted} planar graph. The lower bound holds even for incremental or decremental updates. Our upper bounds are obtained by novel uses and manipulations of Voronoi diagrams. These include maintaining the Voronoi diagram when edges of the graph are deleted, allowing the sites of the Voronoi diagram to lie on a BFS tree level (rather than on boundaries of $r$-division), and a new reduction from incremental diameter to incremental {\em distance oracles} that could be of interest beyond planar graphs. Our lower bound is the first lower bound for a dynamic planar graph problem that is conditioned on the SETH.
Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study
Authors: Sajjad Rahmani, AmirHossein Naghshzan, Latifa Guerrouj
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.03017
Pdf link: https://arxiv.org/pdf/2305.03017
Abstract The study of code example recommendation has been conducted extensively in the past and recently in order to assist developers in their software development tasks. This is because developers often spend significant time searching for relevant code examples on the internet, utilizing open-source projects and informal documentation. For finding useful code examples, informal documentation, such as Stack Overflow discussions and forums, can be invaluable. We have focused our research on Stack Overflow, which is a popular resource for discussing different topics among software developers. For increasing the quality of the recommended code examples, we have collected and recommended the best code examples in the Java programming language. We have utilized BERT in our approach, which is a Large Language Model (LLM) for text representation that can effectively extract semantic information from textual data. Our first step involved using BERT to convert code examples into numerical vectors. Subsequently, we applied LSH to identify Approximate Nearest Neighbors (ANN). Our research involved the implementation of two variants of this approach, namely the Random Hyperplane-based LSH and the Query-Aware LSH. Our study compared two algorithms using four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. The results of our analysis revealed that the Query- Aware (QA) approach outperformed the Random Hyperplane-based (RH) approach in terms of HitRate. Specifically, the QA approach achieved a HitRate improvement of 20% to 35% for query pairs compared to the RH approach. Creating hashing tables and assigning data samples to buckets using the QA approach is at least four times faster than the RH approach. The QA approach returns code examples within milliseconds, while it takes several seconds (sec) for the RH approach to recommend code examples.
OctFormer: Octree-based Transformers for 3D Point Clouds
Authors: Peng-Shuai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.03045
Pdf link: https://arxiv.org/pdf/2305.03045
Abstract OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D segmentation and detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at https://wang-ps.github.io/octformer.
Keyword: mobile

Privacy in Population Protocols with Probabilistic Scheduling
Authors: Talley Amir, James Aspnes
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2305.02377
Pdf link: https://arxiv.org/pdf/2305.02377
Abstract The population protocol model introduced by Angluin et al. in 2006 offers a theoretical framework for designing and analyzing distributed algorithms among limited-resource mobile agents. While the original population protocol model considers the concept of anonymity, the issue of privacy is not investigated thoroughly. However, there is a need for time- and space-efficient privacy-preserving techniques in the population protocol model if these algorithms are to be implemented in settings handling sensitive data, such as sensor networks, IoT devices, and drones. In this work, we introduce several formal definitions of privacy, ranging from assuring only plausible deniability of the population input vector to having a full information-theoretic guarantee that knowledge beyond an agent's input and output bear no influence on the probability of a particular input vector. We then apply these definitions to both existing and novel protocols. We show that the Remainder-computing protocol given by Delporte-Gallet et al. in 2007 (which is proven to satisfy output independent privacy under adversarial scheduling) is not information-theoretically private under probabilistic scheduling. In contrast, we provide a new algorithm and demonstrate that it correctly and information-theoretically privately computes Remainder under probabilistic scheduling.
Breast Cancer Diagnosis Using Machine Learning Techniques
Authors: Juan Zuluaga-Gomez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2305.02482
Pdf link: https://arxiv.org/pdf/2305.02482
Abstract Breast cancer is one of the most threatening diseases in women's life; thus, the early and accurate diagnosis plays a key role in reducing the risk of death in a patient's life. Mammography stands as the reference technique for breast cancer screening; nevertheless, many countries still lack access to mammograms due to economic, social, and cultural issues. Latest advances in computational tools, infrared cameras and devices for bio-impedance quantification, have given a chance to emerge other reference techniques like thermography, infrared thermography, electrical impedance tomography and biomarkers found in blood tests, therefore being faster, reliable and cheaper than other methods. In the last two decades, the techniques mentioned above have been considered as parallel and extended approaches for breast cancer diagnosis, as well many authors concluded that false positives and false negatives rates are significantly reduced. Moreover, when a screening method works together with a computational technique, it generates a "computer-aided diagnosis" system. The present work aims to review the last breakthroughs about the three techniques mentioned earlier, suggested machine learning techniques to breast cancer diagnosis, thus, describing the benefits of some methods in relation with other ones, such as, logistic regression, decision trees, random forest, deep and convolutional neural networks. With this, we studied several hyperparameters optimization approaches with parzen tree optimizers to improve the performance of baseline models. An exploratory data analysis for each database and a benchmark of convolutional neural networks for the database of thermal images are presented. The benchmark process, reviews image classification techniques with convolutional neural networks, like, Resnet50, NasNetmobile, InceptionResnet and Xception.
Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation
Authors: Renshen Wang, Yasuhisa Fujii, Alessandro Bissacco
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02577
Pdf link: https://arxiv.org/pdf/2305.02577
Abstract Text reading order is a crucial aspect in the output of an OCR engine, with a large impact on downstream tasks. Its difficulty lies in the large variation of domain specific layout structures, and is further exacerbated by real-world image degradations such as perspective distortions. We propose a lightweight, scalable and generalizable approach to identify text reading order with a multi-modal, multi-task graph convolutional network (GCN) running on a sparse layout based graph. Predictions from the model provide hints of bidimensional relations among text lines and layout region structures, upon which a post-processing cluster-and-sort algorithm generates an ordered sequence of all the text lines. The model is language-agnostic and runs effectively across multi-language datasets that contain various types of images taken in uncontrolled conditions, and it is small enough to be deployed on virtually any platform including mobile devices.
Fundamental Detection Probability vs. Achievable Rate Tradeoff in Integrated Sensing and Communication Systems
Authors: Jiancheng An, Hongbin Li, Derrick Wing Kwan Ng, Chau Yuen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2305.02847
Pdf link: https://arxiv.org/pdf/2305.02847
Abstract Integrating sensing functionalities is envisioned as a distinguishing feature of next-generation mobile networks, which has given rise to the development of a novel enabling technology -- \emph{Integrated Sensing and Communication (ISAC)}. Portraying the theoretical performance bounds of ISAC systems is fundamentally important to understand how sensing and communication functionalities interact (e.g., competitively or cooperatively) in terms of resource utilization, while revealing insights and guidelines for the development of effective physical-layer techniques. In this paper, we characterize the fundamental performance tradeoff between the detection probability for target monitoring and the user's achievable rate in ISAC systems. To this end, we first discuss the achievable rate of the user under sensing-free and sensing-interfered communication scenarios. Furthermore, we derive closed-form expressions for the probability of false alarm (PFA) and the successful probability of detection (PD) for monitoring the target of interest, where we consider both communication-assisted and communication-interfered sensing scenarios. In addition, the effects of the unknown channel coefficient are also taken into account in our theoretical analysis. Based on our analytical results, we then carry out a comprehensive assessment of the performance tradeoff between sensing and communication functionalities. Specifically, we formulate a power allocation problem to minimize the transmit power at the base station (BS) under the constraints of ensuring a required PD for perception as well as the communication user's quality of service requirement in terms of achievable rate. Finally, simulation results corroborate the accuracy of our theoretical analysis and the effectiveness of the proposed power allocation solutions.
Keyword: pruning

There is no result

Keyword: voxel

OctFormer: Octree-based Transformers for 3D Point Clouds
Authors: Peng-Shuai Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2305.03045
Pdf link: https://arxiv.org/pdf/2305.03045
Abstract OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D segmentation and detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at https://wang-ps.github.io/octformer.
NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds
Authors: Jun-Kun Chen, Jipeng Lyu, Yu-Xiong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.03049
Pdf link: https://arxiv.org/pdf/2305.03049
Abstract This paper proposes NeuralEditor that enables neural radiance fields (NeRFs) natively editable for general shape editing tasks. Despite their impressive results on novel-view synthesis, it remains a fundamental challenge for NeRFs to edit the shape of the scene. Our key insight is to exploit the explicit point cloud representation as the underlying structure to construct NeRFs, inspired by the intuitive interpretation of NeRF rendering as a process that projects or "plots" the associated 3D point cloud to a 2D image plane. To this end, NeuralEditor introduces a novel rendering scheme based on deterministic integration within K-D tree-guided density-adaptive voxels, which produces both high-quality rendering results and precise point clouds through optimization. NeuralEditor then performs shape editing via mapping associated points between point clouds. Extensive evaluation shows that NeuralEditor achieves state-of-the-art performance in both shape deformation and scene morphing tasks. Notably, NeuralEditor supports both zero-shot inference and further fine-tuning over the edited scene. Our code, benchmark, and demo video are available at https://immortalco.github.io/NeuralEditor.
Keyword: lidar

APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction
Authors: Quan Liu, Yunsong Zhou, Hongzi Zhu, Shan Chang, Minyi Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02893
Pdf link: https://arxiv.org/pdf/2305.02893
Abstract For many driving safety applications, it is of great importance to accurately register LiDAR point clouds generated on distant moving vehicles. However, such point clouds have extremely different point density and sensor perspective on the same object, making registration on such point clouds very hard. In this paper, we propose a novel feature extraction framework, called APR, for online distant point cloud registration. Specifically, APR leverages an autoencoder design, where the autoencoder reconstructs a denser aggregated point cloud with several frames instead of the original single input point cloud. Our design forces the encoder to extract features with rich local geometry information based on one single input point cloud. Such features are then used for online distant point cloud registration. We conduct extensive experiments against state-of-the-art (SOTA) feature extractors on KITTI and nuScenes datasets. Results show that APR outperforms all other extractors by a large margin, increasing average registration recall of SOTA extractors by 7.1% on LoKITTI and 4.6% on LoNuScenes.
OSDaR23: Open Sensor Data for Rail 2023
Authors: Rustam Tagiew, Martin Köppel, Karsten Schwalbe, Patrick Denzler, Philipp Neumaier, Tobias Klockau, Martin Boekhoff, Pavel Klasek, Roman Tilly
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.03001
Pdf link: https://arxiv.org/pdf/2305.03001
Abstract For driverless train operation on mainline railways, several tasks need to be implemented by technical systems. One of the most challenging tasks is to monitor the train's driveway and its surroundings for potential obstacles due to long braking distances. Machine learning algorithms can be used to analyze data from vision sensors such as infrared (IR) and visual (RGB) cameras, lidars, and radars to detect objects. Such algorithms require large amounts of annotated data from objects in the rail environment that may pose potential obstacles, as well as rail-specific objects such as tracks or catenary poles, as training data. However, only very few datasets are publicly available and these available datasets typically involve only a limited number of sensors. Datasets and trained models from other domains, such as automotive, are useful but insufficient for object detection in the railway context. Therefore, this publication presents OSDaR23, a multi-sensor dataset of 21 sequences captured in Hamburg, Germany, in September 2021. The sensor setup consisted of multiple calibrated and synchronized IR/RGB cameras, lidars, a radar, and position and acceleration sensors front-mounted on a railway vehicle. In addition to raw data, the dataset contains 204091 polyline, polygonal, rectangle and cuboid annotations for 20 different object classes. This dataset can also be used for tasks going beyond collision prediction, which are listed in this paper.
Keyword: diffusion

Spectral cyclicality of networks
Authors: Nizar Riane
Subjects: Social and Information Networks (cs.SI); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2305.02371
Pdf link: https://arxiv.org/pdf/2305.02371
Abstract We introduce the spectral influence and spectral cyclicality based on the largest eigenvalue of a graph adjacency matrix, two novel concepts of centrality capturing diffusion and interdependence from a local and a global point of view respectively. We define a new clustering algorithm to distinguish communities with high cyclicality and interdependence, allowing overlapping, and we conclude our study with an application to the input-output analysis in the case of the Moroccan economy.
Shap-E: Generating Conditional 3D Implicit Functions
Authors: Heewoo Jun, Alex Nichol
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2305.02463
Pdf link: https://arxiv.org/pdf/2305.02463
Abstract We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space. We release model weights, inference code, and samples at https://github.com/openai/shap-e.
LayoutDM: Transformer-based Diffusion Model for Layout Generation
Authors: Shang Chai, Liansheng Zhuang, Fengying Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02567
Pdf link: https://arxiv.org/pdf/2305.02567
Abstract Automatic layout generation that can synthesize high-quality layouts is an important tool for graphic design in many applications. Though existing methods based on generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) have progressed, they still leave much room for improving the quality and diversity of the results. Inspired by the recent success of diffusion models in generating high-quality images, this paper explores their potential for conditional layout generation and proposes Transformer-based Layout Diffusion Model (LayoutDM) by instantiating the conditional denoising diffusion probabilistic model (DDPM) with a purely transformer-based architecture. Instead of using convolutional neural networks, a transformer-based conditional Layout Denoiser is proposed to learn the reverse diffusion process to generate samples from noised layout data. Benefitting from both transformer and DDPM, our LayoutDM is of desired properties such as high-quality generation, strong sample diversity, faithful distribution coverage, and stationary training in comparison to GANs and VAEs. Quantitative and qualitative experimental results show that our method outperforms state-of-the-art generative models in terms of quality and diversity.
Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model
Authors: Chao Xu, Shaoting Zhu, Junwei Zhu, Tianxin Huang, Jiangning Zhang, Ying Tai, Yong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02594
Pdf link: https://arxiv.org/pdf/2305.02594
Abstract Multimodal-driven talking face generation refers to animating a portrait with the given pose, expression, and gaze transferred from the driving image and video, or estimated from the text and audio. However, existing methods ignore the potential of text modal, and their generators mainly follow the source-oriented feature rearrange paradigm coupled with unstable GAN frameworks. In this work, we first represent the emotion in the text prompt, which could inherit rich semantics from the CLIP, allowing flexible and generalized emotion control. We further reorganize these tasks as the target-oriented texture transfer and adopt the Diffusion Models. More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation. Additionally, TGDM can be gracefully tailored for face swapping. We derive a novel paradigm free of unstable seesaw-style optimization, resulting in simple, stable, and effective training and inference schemes. Extensive experiments demonstrate the superiority of our method.
A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators
Authors: Minwoo Lee, Kyu Tae Kim, Jongho Park
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.02801
Pdf link: https://arxiv.org/pdf/2305.02801
Abstract Self-sustained oscillations are ubiquitous in nature and engineering. In this paper, we propose a novel output-only system-identification framework for identifying the system parameters of a self-sustained oscillator affected by Gaussian white noise. A Langevin model that characterizes the self-sustained oscillator is postulated, and the corresponding Fokker--Planck equation is derived from stochastic averaging. From the drift and diffusion terms of the Fokker--Planck equation, unknown parameters of the system are identified. We develop a numerically efficient algorithm for enhancing the accuracy of parameter identification. In particular, a modified Levenberg--Marquardt optimization algorithm tailored to output-only system identification is introduced. The proposed framework is demonstrated on both numerical and experimental oscillators with varying system parameters that develop into self-sustained oscillations. The results show that the computational cost required for performing the system identification is dramatically reduced by using the proposed framework. Also, system parameters that were difficult to be extracted with the existing method could be efficiently computed with the system identification method developed in this study. Pertaining to the robustness and computational efficiency of the presented framework, this study can contribute to an accurate and fast diagnosis of dynamical systems under stochastic forcing.
Personalize Segment Anything Model with One Shot
Authors: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2305.03048
Pdf link: https://arxiv.org/pdf/2305.03048
Abstract Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the entire SAM, we introduce two learnable weights for multi-scale masks, only training 2 parameters within 10 seconds for improved performance. To demonstrate our efficacy, we construct a new segmentation dataset, PerSeg, for personalized evaluation, and test our methods on video object segmentation with competitive performance. Besides, our approach can also enhance DreamBooth to personalize Stable Diffusion for text-to-image generation, which discards the background disturbance for better target appearance learning. Code is released at https://github.com/ZrrSkywalker/Personalize-SAM
Keyword: dynamic

Equation-Free Computations as DDDAS Protocols for Bifurcation Studies: A Granular Chain Example
Authors: M.O. Williams, Y.M. Psarellis, D. Pozharskiy, C. Chong, F. Li, J. Yang, P.G. Kevrekidis, I.G. Kevrekidis
Subjects: Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2305.02404
Pdf link: https://arxiv.org/pdf/2305.02404
Abstract This chapter discusses the development and implementation of algorithms based on Equation-Free/Dynamic Data Driven Applications Systems (EF/DDDAS) protocols for the computer-assisted study of the bifurcation structure of complex dynamical systems, such as those that arise in biology (neuronal networks, cell populations), multiscale systems in physics, chemistry and engineering, and system modeling in the social sciences. An illustrative example demonstrates the experimental realization of a chain of granular particles (a so-called engineered granular chain). In particular, the focus is on the detection/stability analysis of time-periodic, spatially localized structures referred to as "dark breathers". Results in this chapter highlight, both experimentally and numerically, that the number of breathers can be controlled by varying the frequency as well as the amplitude of an "out of phase" actuation, and that a "snaking" structure in the bifurcation diagram (computed through standard, model-based numerical methods for dynamical systems) is also recovered through the EF/DDDAS methods operating on a black-box simulator. The EF/DDDAS protocols presented here are, therefore, a step towards general purpose protocols for performing detailed bifurcation analyses directly on laboratory experiments, not only on their mathematical models, but also on measured data.
Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks
Authors: Sara Riva, Jean-Marie Lagniez, Gustavo Magaña López, Loïc Paulevé
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Systems and Control (eess.SY); Molecular Networks (q-bio.MN)
Arxiv link: https://arxiv.org/abs/2305.02442
Pdf link: https://arxiv.org/pdf/2305.02442
Abstract Minimal trap spaces (MTSs) capture subspaces in which the Boolean dynamics is trapped, whatever the update mode. They correspond to the attractors of the most permissive mode. Due to their versatility, the computation of MTSs has recently gained traction, essentially by focusing on their enumeration. In this paper, we address the logical reasoning on universal properties of MTSs in the scope of two problems: the reprogramming of Boolean networks for identifying the permanent freeze of Boolean variables that enforce a given property on all the MTSs, and the synthesis of Boolean networks from universal properties on their MTSs. Both problems reduce to solving the satisfiability of quantified propositional logic formula with 3 levels of quantifiers ($\exists\forall\exists$). In this paper, we introduce a Counter-Example Guided Refinement Abstraction (CEGAR) to efficiently solve these problems by coupling the resolution of two simpler formulas. We provide a prototype relying on Answer-Set Programming for each formula and show its tractability on a wide range of Boolean models of biological networks.
Model-based and Data-based Dynamic Output Feedback for Externally Positive Systems
Authors: Abed AlRahman Al Makdah, Fabio Pasqualetti
Subjects: Systems and Control (eess.SY); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2305.02472
Pdf link: https://arxiv.org/pdf/2305.02472
Abstract In this work, we derive dynamic output-feedback controllers that render the closed-loop system externally positive. We begin by expressing the class of discrete-time, linear, time-invariant systems and the class of dynamic controllers in the space of input-output behaviors, where a dynamic controller can be expressed as a static behavioral feedback gain. We leverage the static form of the controller to derive output-feedback controllers that achieve monotonic output tracking of a constant non-negative reference output. Further, we provide a direct data-driven approach to derive monotonic tracking output-feedback controllers for single-input-single-output (SISO) systems. Our approaches, model-based and data-based, allow us to obtain output-feedback controllers that render the closed-loop system externally positive. Finally, we validate our results numerically in a drone landing control problem.
AutoML-GPT: Automatic Machine Learning with GPT
Authors: Shujian Zhang, Chengyue Gong, Lemeng Wu, Xingchao Liu, Mingyuan Zhou
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2305.02499
Pdf link: https://arxiv.org/pdf/2305.02499
Abstract AI tasks encompass a wide range of domains and fields. While numerous AI models have been designed for specific tasks and applications, they often require considerable human efforts in finding the right model architecture, optimization algorithm, and hyperparameters. Recent advances in large language models (LLMs) like ChatGPT show remarkable capabilities in various aspects of reasoning, comprehension, and interaction. Consequently, we propose developing task-oriented prompts and automatically utilizing LLMs to automate the training pipeline. To implement this concept, we present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyperparameters. AutoML-GPT dynamically takes user requests from the model and data cards and composes the corresponding prompt paragraph. Ultimately, with this prompt paragraph, AutoML-GPT will automatically conduct the experiments from data processing to model architecture, hyperparameter tuning, and predicted training log. By leveraging {\ours}'s robust language capabilities and the available AI models, AutoML-GPT can tackle numerous intricate AI tasks across various tasks and datasets. This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas. Extensive experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many AI tasks.
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
Authors: Yaqi Shen, Le Hui, Jin Xie, Jian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02528
Pdf link: https://arxiv.org/pdf/2305.02528
Abstract 3D scene flow estimation aims to estimate point-wise motions between two consecutive frames of point clouds. Superpoints, i.e., points with similar geometric features, are usually employed to capture similar motions of local regions in 3D scenes for scene flow estimation. However, in existing methods, superpoints are generated with the offline clustering methods, which cannot characterize local regions with similar motions for complex 3D scenes well, leading to inaccurate scene flow estimation. To this end, we propose an iterative end-to-end superpoint based scene flow estimation framework, where the superpoints can be dynamically updated to guide the point-level flow prediction. Specifically, our framework consists of a flow guided superpoint generation module and a superpoint guided flow refinement module. In our superpoint generation module, we utilize the bidirectional flow information at the previous iteration to obtain the matching points of points and superpoint centers for soft point-to-superpoint association construction, in which the superpoints are generated for pairwise point clouds. With the generated superpoints, we first reconstruct the flow for each point by adaptively aggregating the superpoint-level flow, and then encode the consistency between the reconstructed flow of pairwise point clouds. Finally, we feed the consistency encoding along with the reconstructed flow into GRU to refine point-level flow. Extensive experiments on several different datasets show that our method can achieve promising performance.
Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
Authors: Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2305.02541
Pdf link: https://arxiv.org/pdf/2305.02541
Abstract The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises. One major reason is that a higher compression rate induces more loss of visual signals on the higher frequency spectrum which reflect the details on pixel space. In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. The FCM can be easily incorporated into the VQ-VAE structure, and we refer to the new model as Frequency Augmented VAE (FA-VAE). In addition, a Dynamic Spectrum Loss (DSL) is introduced to guide the FCMs to balance between various frequencies dynamically for optimal reconstruction. FA-VAE is further extended to the text-to-image synthesis task, and a Cross-attention Autoregressive Transformer (CAT) is proposed to obtain more precise semantic attributes in texts. Extensive reconstruction experiments with different compression rates are conducted on several benchmark datasets, and the results demonstrate that the proposed FA-VAE is able to restore more faithfully the details compared to SOTA methods. CAT also shows improved generation quality with better image-text semantic alignment.
Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
Authors: Sen Zhao1, Wei Wei, Yifan Liu, Ziyang Wang, Wendi Li, Xian-Ling Mao, Shuai Zhu, Minghui Yang, Zujie Wen
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2305.02575
Pdf link: https://arxiv.org/pdf/2305.02575
Abstract Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
Distributed System Fuzzing
Authors: Ruijie Meng, George Pîrlea, Abhik Roychoudhury, Ilya Sergey
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2305.02601
Pdf link: https://arxiv.org/pdf/2305.02601
Abstract Grey-box fuzzing is the lightweight approach of choice for finding bugs in sequential programs. It provides a balance between efficiency and effectiveness by conducting a biased random search over the domain of program inputs using a feedback function from observed test executions. For distributed system testing, however, the state-of-practice is represented today by only black-box tools that do not attempt to infer and exploit any knowledge of the system's past behaviours to guide the search for bugs. In this work, we present Mallory: the first framework for grey-box fuzz-testing of distributed systems. Unlike popular black-box distributed system fuzzers, such as Jepsen, that search for bugs by randomly injecting network partitions and node faults or by following human-defined schedules, Mallory is adaptive. It exercises a novel metric to learn how to maximize the number of observed system behaviors by choosing different sequences of faults, thus increasing the likelihood of finding new bugs. The key enablers for our approach are the new ideas of timeline-driven testing and timeline abstraction that provide the feedback function guiding a biased random search for failures. Mallory dynamically constructs Lamport timelines of the system behaviour, abstracts these timelines into happens-before summaries, and introduces faults guided by its real-time observation of the summaries. We have evaluated Mallory on a diverse set of widely-used industrial distributed systems. Compared to the start-of-the-art black-box fuzzer Jepsen, Mallory explores more behaviours and takes less time to find bugs. Mallory discovered 22 zero-day bugs (of which 18 were confirmed by developers), including 10 new vulnerabilities, in rigorously-tested distributed systems such as Braft, Dqlite, and Redis. 6 new CVEs have been assigned.
High-dimensional Bayesian Optimization via Semi-supervised Learning with Optimized Unlabeled Data Sampling
Authors: Yuxuan Yin, Yu Wang, Peng Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.02614
Pdf link: https://arxiv.org/pdf/2305.02614
Abstract Bayesian optimization (BO) is a powerful tool for seeking the global optimum of black-box functions. While evaluations of the black-box functions can be highly costly, it is desirable to reduce the use of expensive labeled data. For the first time, we introduce a teacher-student model to exploit semi-supervised learning that can make use of large amounts of unlabelled data under the context of BO. Importantly, we show that the selection of the validation and unlabeled data is key to the performance of BO. To optimize the sampling of unlabeled data, we employ a black-box parameterized sampling distribution optimized as part of the employed bi-level optimization framework. Taking one step further, we demonstrate that the performance of BO can be further improved by selecting unlabeled data from a dynamically fitted extreme value distribution. Our BO method operates in a learned latent space with reduced dimensionality, making it scalable to high-dimensional problems. The proposed approach outperforms significantly the existing BO methods on several synthetic and real-world optimization tasks.
LatentAugment: Dynamically Optimized Latent Probabilities of Data Augmentation
Authors: Koichi Kuriyama
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.02668
Pdf link: https://arxiv.org/pdf/2305.02668
Abstract Although data augmentation is a powerful technique for improving the performance of image classification tasks, it is difficult to identify the best augmentation policy. The optimal augmentation policy, which is the latent variable, cannot be directly observed. To address this problem, this study proposes $\textit{LatentAugment}$, which estimates the latent probability of optimal augmentation. The proposed method is appealing in that it can dynamically optimize the augmentation strategies for each input and model parameter in learning iterations. Theoretical analysis shows that LatentAugment is a general model that includes other augmentation methods as special cases, and it is simple and computationally efficient in comparison with existing augmentation methods. Experimental results show that the proposed LatentAugment has higher test accuracy than previous augmentation methods on the CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.
Efficient and Robust Time-Optimal Trajectory Planning and Control for Agile Quadrotor Flight
Authors: Ziyu Zhou, Gang Wang, Jian Sun, Jikai Wang, Jie Chen
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.02772
Pdf link: https://arxiv.org/pdf/2305.02772
Abstract Agile quadrotor flight relies on rapidly planning and accurately tracking time-optimal trajectories, a technology critical to their application in the wild. However, the computational burden of computing time-optimal trajectories based on the full quadrotor dynamics (typically on the order of minutes or even hours) can hinder its ability to respond quickly to changing scenarios. Additionally, modeling errors and external disturbances can lead to deviations from the desired trajectory during tracking in real time. This letter proposes a novel approach to computing time-optimal trajectories, by fixing the nodes with waypoint constraints and adopting separate sampling intervals for trajectories between waypoints, which significantly accelerates trajectory planning. Furthermore, the planned paths are tracked via a time-adaptive model predictive control scheme whose allocated tracking time can be adaptively adjusted on-the-fly, therefore enhancing the tracking accuracy and robustness. We evaluate our approach through simulations and experimentally validate its performance in dynamic waypoint scenarios for time-optimal trajectory replanning and trajectory tracking.
A Momentum-Incorporated Non-Negative Latent Factorization of Tensors Model for Dynamic Network Representation
Authors: Aoling Zeng
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2305.02782
Pdf link: https://arxiv.org/pdf/2305.02782
Abstract A large-scale dynamic network (LDN) is a source of data in many big data-related applications due to their large number of entities and large-scale dynamic interactions. They can be modeled as a high-dimensional incomplete (HDI) tensor that contains a wealth of knowledge about time patterns. A Latent factorization of tensors (LFT) model efficiently extracts this time pattern, which can be established using stochastic gradient descent (SGD) solvers. However, LFT models based on SGD are often limited by training schemes and have poor tail convergence. To solve this problem, this paper proposes a novel nonlinear LFT model (MNNL) based on momentum-incorporated SGD, which extracts non-negative latent factors from HDI tensors to make training unconstrained and compatible with general training schemes, while improving convergence accuracy and speed. Empirical studies on two LDN datasets show that compared to existing models, the MNNL model has higher prediction accuracy and convergence speed.
BranchNorm: Robustly Scaling Extremely Deep Transformers
Authors: Yijin Liu, Xianfeng Zeng, Fandong Meng, Jie Zhou
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.02790
Pdf link: https://arxiv.org/pdf/2305.02790
Abstract Recently, DeepNorm scales Transformers into extremely deep (i.e., 1000 layers) and reveals the promising potential of deep scaling. To stabilize the training of deep models, DeepNorm (Wang et al., 2022) attempts to constrain the model update to a constant value. Although applying such a constraint can benefit the early stage of model training, it may lead to undertrained models during the whole training procedure. In this paper, we propose BranchNorm, which dynamically rescales the non-residual branch of Transformer in accordance with the training period. BranchNorm not only theoretically stabilizes the training with smooth gradient norms at the early stage, but also encourages better convergence in the subsequent training stage. Experiment results on multiple translation tasks demonstrate that BranchNorm achieves a better trade-off between training stability and converge performance.
A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators
Authors: Minwoo Lee, Kyu Tae Kim, Jongho Park
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2305.02801
Pdf link: https://arxiv.org/pdf/2305.02801
Abstract Self-sustained oscillations are ubiquitous in nature and engineering. In this paper, we propose a novel output-only system-identification framework for identifying the system parameters of a self-sustained oscillator affected by Gaussian white noise. A Langevin model that characterizes the self-sustained oscillator is postulated, and the corresponding Fokker--Planck equation is derived from stochastic averaging. From the drift and diffusion terms of the Fokker--Planck equation, unknown parameters of the system are identified. We develop a numerically efficient algorithm for enhancing the accuracy of parameter identification. In particular, a modified Levenberg--Marquardt optimization algorithm tailored to output-only system identification is introduced. The proposed framework is demonstrated on both numerical and experimental oscillators with varying system parameters that develop into self-sustained oscillations. The results show that the computational cost required for performing the system identification is dramatically reduced by using the proposed framework. Also, system parameters that were difficult to be extracted with the existing method could be efficiently computed with the system identification method developed in this study. Pertaining to the robustness and computational efficiency of the presented framework, this study can contribute to an accurate and fast diagnosis of dynamical systems under stochastic forcing.
ALADIN-based Distributed Model Predictive Control with dynamic partitioning: An application to Solar Parabolic Trough Plants
Authors: P. Chanfreut, J. M. Maestre, D. Krishnamoorthy, E. F. Camacho
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2305.02821
Pdf link: https://arxiv.org/pdf/2305.02821
Abstract This article presents a distributed model predictive controller with time-varying partitioning based on the augmented Lagrangian alternating direction inexact Newton method (ALADIN). In particular, we address the problem of controlling the temperature of a heat transfer fluid (HTF) in a set of loops of solar parabolic collectors by adjusting its flow rate. The control problem involves a nonlinear prediction model, decoupled inequality constraints, and coupled affine constraints on the system inputs. The application of ALADIN to address such a problem is combined with a dynamic clustering-based partitioning approach that aims at reducing, with minimum performance losses, the number of variables to be coordinated. Numerical results on a 10-loop plant are presented.
Maximum Causal Entropy Inverse Constrained Reinforcement Learning
Authors: Mattijs Baert, Pietro Mazzaglia, Sam Leroux, Pieter Simoens
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.02857
Pdf link: https://arxiv.org/pdf/2305.02857
Abstract When deploying artificial agents in real-world environments where they interact with humans, it is crucial that their behavior is aligned with the values, social norms or other requirements of that environment. However, many environments have implicit constraints that are difficult to specify and transfer to a learning agent. To address this challenge, we propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy that adheres to these constraints, using demonstrations of agents that abide by the constraints. We prove convergence in a tabular setting and provide an approximation which scales to complex environments. We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations, and we evaluate the learned cost function based on its transferability to other agents. Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments, and it is able to handle problems with stochastic dynamics and a continuous state-action space.
2x Faster Language Model Pre-training via Masked Structural Growth
Authors: Yiqun Yao, Zheng Zhang, Jing Li, Yequan Wang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2305.02869
Pdf link: https://arxiv.org/pdf/2305.02869
Abstract Acceleration of large language model pre-training is a critical issue in present NLP research. In this paper, we focus on speeding up pre-training by progressively growing from a small Transformer structure to a large one. There are two main research problems related to progressive growth: growth schedule and growth operator. For growth schedule, existing work has explored multi-stage expansion of depth and feedforward layers. However, the impact of each dimension on the schedule's efficiency is still an open question. For growth operator, existing work relies on the initialization of new weights to inherit knowledge, and achieve only non-strict function preservation, limiting further optimization of training dynamics. To address these issues, we propose Masked Structural Growth (MSG), including growth schedules involving all possible dimensions and strictly function-preserving growth operators that is independent of the initialization of new weights. Experiments show that MSG is significantly faster than related work: we achieve a speed-up of 80% for Bert-base and 120% for Bert-large pre-training. Moreover, MSG is able to improve fine-tuning performances at the same time.
Simple Noisy Environment Augmentation for Reinforcement Learning
Authors: Raad Khraishi, Ramin Okhrati
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.02882
Pdf link: https://arxiv.org/pdf/2305.02882
Abstract Data augmentation is a widely used technique for improving model performance in machine learning, particularly in computer vision and natural language processing. Recently, there has been increasing interest in applying augmentation techniques to reinforcement learning (RL) problems, with a focus on image-based augmentation. In this paper, we explore a set of generic wrappers designed to augment RL environments with noise and encourage agent exploration and improve training data diversity which are applicable to a broad spectrum of RL algorithms and environments. Specifically, we concentrate on augmentations concerning states, rewards, and transition dynamics and introduce two novel augmentation techniques. In addition, we introduce a noise rate hyperparameter for control over the frequency of noise injection. We present experimental results on the impact of these wrappers on return using three popular RL algorithms, Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), and Proximal Policy Optimization (PPO), across five MuJoCo environments. To support the choice of augmentation technique in practice, we also present analysis that explores the performance these techniques across environments. Lastly, we publish the wrappers in our noisyenv repository for use with gym environments.
SlipCover: Near Zero-Overhead Code Coverage for Python
Authors: Juan Altmayer Pizzorno, Emery D Berger
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2305.02886
Pdf link: https://arxiv.org/pdf/2305.02886
Abstract Coverage analysis is widely used but can suffer from high overhead. This overhead is especially acute in the context of Python, which is already notoriously slow (a recent study observes a roughly 30x slowdown vs. native code). We find that the state-of-the-art coverage tool for Python, coverage.py, leads to slowdowns of 1.3x--3.6x (median: 2.8x) for the standard Python interpreter. Slowdowns are even more extreme when using PyPy, a JIT-compiled Python implementation, where coverage.py slows execution by 2.4x--325x (median: 14x). This performance degradation reduces the utility of coverage analysis in most use cases, including testing and fuzzing, and precludes its use in deployment. This paper presents SlipCover, a novel, near-zero overhead coverage analyzer for Python. SlipCover works without modifications to either the Python interpreter or PyPy. It first processes a program's AST to accurately identify all branches and lines. SlipCover then dynamically rewrites Python bytecodes to add lightweight instrumentation to each identified branch and line. At run time, SlipCover periodically de-instruments already-covered lines and branches. The result is extremely low overheads -- a median of just 5% -- making SlipCover suitable for use in deployment. We show its efficiency can translate to significant increases in the speed of coverage-based clients. As a proof of concept, we integrate SlipCover into TPBT, a targeted property-based testing system, and observe a 22x speedup.
Aligning Bird-Eye View Representation of Point Cloud Sequences using Scene Flow
Authors: Minh-Quan Dao, Vincent Frémont, Elwan Héry
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.02909
Pdf link: https://arxiv.org/pdf/2305.02909
Abstract Low-resolution point clouds are challenging for object detection methods due to their sparsity. Densifying the present point cloud by concatenating it with its predecessors is a popular solution to this challenge. Such concatenation is possible thanks to the removal of ego vehicle motion using its odometry. This method is called Ego Motion Compensation (EMC). Thanks to the added points, EMC significantly improves the performance of single-frame detectors. However, it suffers from the shadow effect that manifests in dynamic objects' points scattering along their trajectories. This effect results in a misalignment between feature maps and objects' locations, thus limiting performance improvement to stationary and slow-moving objects only. Scene flow allows aligning point clouds in 3D space, thus naturally resolving the misalignment in feature spaces. By observing that scene flow computation shares several components with 3D object detection pipelines, we develop a plug-in module that enables single-frame detectors to compute scene flow to rectify their Bird-Eye View representation. Experiments on the NuScenes dataset show that our module leads to a significant increase (up to 16%) in the Average Precision of large vehicles, which interestingly demonstrates the most severe shadow effect. The code is published at https://github.com/quan-dao/pc-corrector.
Switched max-plus linear-dual inequalities: cycle time analysis and applications
Authors: Davide Zorzenon, Jan Komenda, Jörg Raisch
Subjects: Systems and Control (eess.SY); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2305.02934
Pdf link: https://arxiv.org/pdf/2305.02934
Abstract P-time event graphs are discrete event systems suitable for modeling processes in which tasks must be executed in predefined time windows. Their dynamics can be represented by max-plus linear-dual inequalities (LDIs), i.e., systems of linear dynamical inequalities in the primal and dual operations of the max-plus algebra. We define a new class of models called switched LDIs (SLDIs), which allow to switch between different modes of operation, each corresponding to a set of LDIs, according to a sequence of modes called schedule. In this paper, we focus on the analysis of SLDIs when the considered schedule is fixed and either periodic or intermittently periodic. We show that SLDIs can model a wide range of applications including single-robot multi-product processing networks, in which every product has different processing requirements and corresponds to a specific mode of operation. Based on the analysis of SLDIs, we propose algorithms to compute: i. minimum and maximum cycle times for these processes, improving the time complexity of other existing approaches; ii. a complete trajectory of the robot including start-up and shut-down transients.
What Else Can Voronoi Diagrams Do For Diameter In Planar Graphs?
Authors: Amir Abboud, Shay Mozes, Oren Weimann
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2305.02946
Pdf link: https://arxiv.org/pdf/2305.02946
Abstract The Voronoi diagrams technique was introduced by Cabello to compute the diameter of planar graphs in subquadratic time. We present novel applications of this technique in static, fault-tolerant, and partially-dynamic undirected unweighted planar graphs, as well as some new limitations. 1. In the static case, we give $n^{3+o(1)}/D^2$ and $\tilde{O}(n\cdot D^2)$ time algorithms for computing the diameter of a planar graph $G$ with diameter $D$. These are faster than the state of the art $\tilde{O}(n^{5/3})$ when $D<n^{1/3}$ or $D>n^{2/3}$. 2. In the fault-tolerant setting, we give an $n^{7/3+o(1)}$ time algorithm for computing the diameter of $G\setminus {e}$ for every edge $e$ in $G$ (the replacement diameter problem). Compared to the naive $\tilde{O}(n^{8/3})$ time algorithm that runs the static algorithm for every edge. 3. In the incremental setting, where we wish to maintain the diameter while while adding edges, we present an algorithm with total running time $n^{7/3+o(1)}$. Compared to the naive $\tilde{O}(n^{8/3})$ time algorithm that runs the static algorithm after every update. 4. We give a lower bound (conditioned on the SETH) ruling out an amortized $O(n^{1-\varepsilon})$ update time for maintaining the diameter in {\em weighted} planar graph. The lower bound holds even for incremental or decremental updates. Our upper bounds are obtained by novel uses and manipulations of Voronoi diagrams. These include maintaining the Voronoi diagram when edges of the graph are deleted, allowing the sites of the Voronoi diagram to lie on a BFS tree level (rather than on boundaries of $r$-division), and a new reduction from incremental diameter to incremental {\em distance oracles} that could be of interest beyond planar graphs. Our lower bound is the first lower bound for a dynamic planar graph problem that is conditioned on the SETH.
Masked Trajectory Models for Prediction, Representation, and Control
Authors: Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2305.02968
Pdf link: https://arxiv.org/pdf/2305.02968
Abstract We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. MTM takes a trajectory, such as a state-action sequence, and aims to reconstruct the trajectory conditioned on random subsets of the same trajectory. By training with a highly randomized masking pattern, MTM learns versatile networks that can take on different roles or capabilities, by simply choosing appropriate masks at inference time. For example, the same MTM network can be used as a forward dynamics model, inverse dynamics model, or even an offline RL agent. Through extensive experiments in several continuous control tasks, we show that the same MTM network -- i.e. same weights -- can match or outperform specialized networks trained for the aforementioned capabilities. Additionally, we find that state representations learned by MTM can significantly accelerate the learning speed of traditional RL algorithms. Finally, in offline RL benchmarks, we find that MTM is competitive with specialized offline RL algorithms, despite MTM being a generic self-supervised learning method without any explicit RL components. Code is available at https://github.com/facebookresearch/mtm
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
Authors: Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, Matthias Nießner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2305.03027
Pdf link: https://arxiv.org/pdf/2305.03027
Abstract We focus on reconstructing high-fidelity radiance fields of human heads, capturing their animations over time, and synthesizing re-renderings from novel viewpoints at arbitrary time steps. To this end, we propose a new multi-view capture setup composed of 16 calibrated machine vision cameras that record time-synchronized images at 7.1 MP resolution and 73 frames per second. With our setup, we collect a new dataset of over 4700 high-resolution, high-framerate sequences of more than 220 human heads, from which we introduce a new human head reconstruction benchmark. The recorded sequences cover a wide range of facial dynamics, including head motions, natural expressions, emotions, and spoken language. In order to reconstruct high-fidelity human heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles (NeRSemble). We represent scene dynamics by combining a deformation field and an ensemble of 3D multi-resolution hash encodings. The deformation field allows for precise modeling of simple scene movements, while the ensemble of hash encodings helps to represent complex dynamics. As a result, we obtain radiance field representations of human heads that capture motion over time and facilitate re-rendering of arbitrary novel viewpoints. In a series of experiments, we explore the design choices of our method and demonstrate that our approach outperforms state-of-the-art dynamic radiance field approaches by a significant margin.
Tracking through Containers and Occluders in the Wild
Authors: Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2305.03052
Pdf link: https://arxiv.org/pdf/2305.03052
Abstract Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce $\textbf{TCOW}$, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

A-suozhang / GetArxivDaily

New submissions for Fri, 5 May 23 #52

Keyword: efficient

MaskSearch: Querying Image Masks at Scale

Privacy in Population Protocols with Probabilistic Scheduling

Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics

ADPDM: Accelerating Distributed Pointer-Traversals on Disaggregated Memory

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

Tackling Universal Properties of Minimal Trap Spaces of Boolean Networks

Bayesian Safety Validation for Black-Box Systems

Perfect Sampling for Hard Spheres from Strong Spatial Mixing

MLHOps: Machine Learning for Healthcare Operations

Generalizing Frobenius Inversion to Quaternion Matrices

A Deterministic Construction of a Large Distance Code from the Wozencraft Ensemble

Directional Antenna Based Scheduling Protocol for IoT Networks

Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection

Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

IMAP: Intrinsically Motivated Adversarial Policy

Re$^3$Dial: Retrieve, Reorganize and Rescale Dialogue Corpus for Long-Turn Open-Domain Dialogue Pre-training

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

Real-Time Spatial Trajectory Planning for Urban Environments Using Dynamic Optimization

Variations on a Theme by Blahut and Arimoto

LatentAugment: Dynamically Optimized Latent Probabilities of Data Augmentation

Real-Time Neural Appearance Models

Mixed Max-and-Min Fractional Programming for Wireless Networks

Guidance & Control Networks for Time-Optimal Quadcopter Flight

Uncertainty Aware Deep Learning Model for Secure and Trustworthy Channel Estimation in 5G Networks

Efficient Personalized Federated Learning via Sparse Model-Adaptation

ItoV: Efficiently Adapting Deep Learning-based Image Watermarking to Video Watermarking

A Momentum-Incorporated Non-Negative Latent Factorization of Tensors Model for Dynamic Network Representation

A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators

Dual-Quaternion Fourier Transform

An asymptotic preserving kinetic scheme for the M1 model of linear transport

Local Optima Correlation Assisted Adaptive Operator Selection

Interpretable Sentence Representation with Variational Autoencoders and Attention

Shannon meets Gray: Noise-robust, Low-sensitivity Codes with Applications in Differential Privacy

MEDIC: A Multimodal Empathy Dataset in Counseling

Fundamental Detection Probability vs. Achievable Rate Tradeoff in Integrated Sensing and Communication Systems

Hierarchical Transformer for Scalable Graph Learning

Input Layer Binarization with Bit-Plane Encoding

UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery

Flow Correlator: A Flow Table Cache Management Strategy

Coloring tournaments with few colors: Algorithms and complexity

Rethinking Population-assisted Off-policy Reinforcement Learning

Majorizing Measures, Codes, and Information

FUSegNet: A Deep Convolutional Neural Network for Foot Ulcer Segmentation

Adaptive Selection of Anchor Items for CUR-based k-NN search with Cross-Encoders

FastAMI -- a Monte Carlo Approach to the Adjustment for Chance in Clustering Comparison Metrics

Decentralized and Compositional Interconnection Topology Synthesis for Linear Networked Systems

TUVF: Learning Generalizable Texture UV Radiance Fields

OctFormer: Octree-based Transformers for 3D Point Clouds

Personalize Segment Anything Model with One Shot

Keyword: faster

Using Language Models on Low-end Hardware

Approximating CKY with Transformers

FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

Perfect Sampling for Hard Spheres from Strong Spatial Mixing

Shap-E: Generating Conditional 3D Implicit Functions

Breast Cancer Diagnosis Using Machine Learning Techniques

SuperNeuro: A Fast and Scalable Simulator for Neuromorphic Computing

Towards a Scalable Proof Engine: A Performant Prototype Rewriting Primitive for Coq

Cuttlefish: Low-rank Model Training without All The Tuning

UrbanBIS: a Large-scale Benchmark for Fine-grained Urban Building Instance Segmentation

Real-Time Neural Appearance Models

Interpretable Sentence Representation with Variational Autoencoders and Attention

Shannon meets Gray: Noise-robust, Low-sensitivity Codes with Applications in Differential Privacy

2x Faster Language Model Pre-training via Masked Structural Growth

What Else Can Voronoi Diagrams Do For Diameter In Planar Graphs?

Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study

OctFormer: Octree-based Transformers for 3D Point Clouds

Keyword: mobile

Privacy in Population Protocols with Probabilistic Scheduling

Breast Cancer Diagnosis Using Machine Learning Techniques

Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation

Fundamental Detection Probability vs. Achievable Rate Tradeoff in Integrated Sensing and Communication Systems

Keyword: pruning

Keyword: voxel