New submissions for Thu, 8 Jun 23

Keyword: efficient

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Authors: Stratis Tsirtsis, Manuel Gomez-Rodriguez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2306.03929
Pdf link: https://arxiv.org/pdf/2306.03929
Abstract Humans performing tasks that involve taking a series of multiple dependent actions over time often learn from experience by reflecting on specific cases and points in time, where different actions could have led to significantly better outcomes. While recent machine learning methods to retrospectively analyze sequential decision making processes promise to aid decision makers in identifying such cases, they have focused on environments with finitely many discrete states. However, in many practical applications, the state of the environment is inherently continuous in nature. In this paper, we aim to fill this gap. We start by formally characterizing a sequence of discrete actions and continuous states using finite horizon Markov decision processes and a broad class of bijective structural causal models. Building upon this characterization, we formalize the problem of finding counterfactually optimal action sequences and show that, in general, we cannot expect to solve it in polynomial time. Then, we develop a search method based on the $A^*$ algorithm that, under a natural form of Lipschitz continuity of the environment's dynamics, is guaranteed to return the optimal solution to the problem. Experiments on real clinical data show that our method is very efficient in practice, and it has the potential to offer interesting insights for sequential decision making tasks.
Guiding The Last Layer in Federated Learning with Pre-Trained Models
Authors: Gwen Legate, Nicolas Bernier, Lucas Caccia, Edouard Oyallon, Eugene Belilovsky
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.03937
Pdf link: https://arxiv.org/pdf/2306.03937
Abstract Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here we revisit the problem of FL from a pre-trained model considered in prior work and expand it to a set of computer vision transfer learning problems. We first observe that simply fitting a linear classification head can be efficient and effective in many cases. We then show that in the FL setting, fitting a classifier using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals, while obtaining strong performance. Finally, we demonstrate that using a two-phase approach of obtaining the classifier and then fine-tuning the model can yield rapid convergence and improved generalization in the federated setting. We demonstrate the potential our method has to reduce communication and compute costs while achieving better model performance.
Rao-Blackwellized Particle Smoothing for Simultaneous Localization and Mapping
Authors: Manon Kok, Arno Solin, Thomas B. Schön
Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.03953
Pdf link: https://arxiv.org/pdf/2306.03953
Abstract Simultaneous localization and mapping (SLAM) is the task of building a map representation of an unknown environment while it at the same time is used for positioning. A probabilistic interpretation of the SLAM task allows for incorporating prior knowledge and for operation under uncertainty. Contrary to the common practice of computing point estimates of the system states, we capture the full posterior density through approximate Bayesian inference. This dynamic learning task falls under state estimation, where the state-of-the-art is in sequential Monte Carlo methods that tackle the forward filtering problem. In this paper, we introduce a framework for probabilistic SLAM using particle smoothing that does not only incorporate observed data in current state estimates, but it also back-tracks the updated knowledge to correct for past drift and ambiguities in both the map and in the states. Our solution can efficiently handle both dense and sparse map representations by Rao-Blackwellization of conditionally linear and conditionally linearized models. We show through simulations and real-world experiments how the principles apply to radio (BLE/Wi-Fi), magnetic field, and visual SLAM. The proposed solution is general, efficient, and works well under confounding noise.
Leveraging Explicit Procedural Instructions for Data-Efficient Action Prediction
Authors: Julia White, Arushi Raghuvanshi, Yada Pruksachatkun
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.03959
Pdf link: https://arxiv.org/pdf/2306.03959
Abstract Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests. While large language models have found success automating these dialogues in constrained environments, their widespread deployment is limited by the substantial quantities of task-specific data required for training. The following paper presents a data-efficient solution to constructing dialogue systems, leveraging explicit instructions derived from agent guidelines, such as company policies or customer service manuals. Our proposed Knowledge-Augmented Dialogue System (KADS) combines a large language model with a knowledge retrieval module that pulls documents outlining relevant procedures from a predefined set of policies, given a user-agent interaction. To train this system, we introduce a semi-supervised pre-training scheme that employs dialogue-document matching and action-oriented masked language modeling with partial parameter freezing. We evaluate the effectiveness of our approach on prominent task-oriented dialogue datasets, Action-Based Conversations Dataset and Schema-Guided Dialogue, for two dialogue tasks: action state tracking and workflow discovery. Our results demonstrate that procedural knowledge augmentation improves accuracy predicting in- and out-of-distribution actions while preserving high performance in settings with low or sparse data.
PILLAR: How to make semi-private learning more effective
Authors: Francesco Pinto, Yaxi Hu, Fanny Yang, Amartya Sanyal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.03962
Pdf link: https://arxiv.org/pdf/2306.03962
Abstract In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints ($\epsilon = 0.1$) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data.
ECQED: Emotion-Cause Quadruple Extraction in Dialogs
Authors: Li Zheng, Donghong Ji, Fei Li, Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Chong Teng
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.03969
Pdf link: https://arxiv.org/pdf/2306.03969
Abstract The existing emotion-cause pair extraction (ECPE) task, unfortunately, ignores extracting the emotion type and cause type, while these fine-grained meta-information can be practically useful in real-world applications, i.e., chat robots and empathic dialog generation. Also the current ECPE is limited to the scenario of single text piece, while neglecting the studies at dialog level that should have more realistic values. In this paper, we extend the ECPE task with a broader definition and scenario, presenting a new task, Emotion-Cause Quadruple Extraction in Dialogs (ECQED), which requires detecting emotion-cause utterance pairs and emotion and cause types. We present an ECQED model based on a structural and semantic heterogeneous graph as well as a parallel grid tagging scheme, which advances in effectively incorporating the dialog context structure, meanwhile solving the challenging overlapped quadruple issue. Via experiments we show that introducing the fine-grained emotion and cause features evidently helps better dialog generation. Also our proposed ECQED system shows exceptional superiority over baselines on both the emotion-cause quadruple or pair extraction tasks, meanwhile being highly efficient.
Explainable AI using expressive Boolean formulas
Authors: Gili Rosenberg, J. Kyle Brubaker, Martin J. A. Schuetz, Grant Salton, Zhihuai Zhu, Elton Yechao Zhu, Serdar Kadıoğlu, Sima E. Borujeni, Helmut G. Katzgraber
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2306.03976
Pdf link: https://arxiv.org/pdf/2306.03976
Abstract We propose and implement an interpretable machine learning classification model for Explainable AI (XAI) based on expressive Boolean formulas. Potential applications include credit scoring and diagnosis of medical conditions. The Boolean formula defines a rule with tunable complexity (or interpretability), according to which input data are classified. Such a formula can include any operator that can be applied to one or more Boolean variables, thus providing higher expressivity compared to more rigid rule-based and tree-based approaches. The classifier is trained using native local optimization techniques, efficiently searching the space of feasible formulas. Shallow rules can be determined by fast Integer Linear Programming (ILP) or Quadratic Unconstrained Binary Optimization (QUBO) solvers, potentially powered by special purpose hardware or quantum devices. We combine the expressivity and efficiency of the native local optimizer with the fast operation of these devices by executing non-local moves that optimize over subtrees of the full Boolean formula. We provide extensive numerical benchmarking results featuring several baselines on well-known public datasets. Based on the results, we find that the native local rule classifier is generally competitive with the other classifiers. The addition of non-local moves achieves similar results with fewer iterations, and therefore using specialized or quantum hardware could lead to a speedup by fast proposal of non-local moves.
Elephant Flows Detection Using Deep Neural Network, Convolutional Neural Network, Long Short Term Memory and Autoencoder
Authors: Getahun Wassie Geremew, Jianguo Ding
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2306.03995
Pdf link: https://arxiv.org/pdf/2306.03995
Abstract Currently, the wide spreading of real-time applications such as VoIP and videos-based applications require more data rates and reduced latency to ensure better quality of service (QoS). A well-designed traffic classification mechanism plays a major role for good QoS provision and network security verification. Port-based approaches and deep packet inspections (DPI) techniques have been used to classify and analyze network traffic flows. However, none of these methods can cope with the rapid growth of network traffic due to the increasing number of Internet users and the growth of real time applications. As a result, these methods lead to network congestion, resulting in packet loss, delay and inadequate QoS delivery. Recently, a deep learning approach has been explored to address the time-consumption and impracticality gaps of the above methods and maintain existing and future traffics of real-time applications. The aim of this research is then to design a dynamic traffic classifier that can detect elephant flows to prevent network congestion. Thus, we are motivated to provide efficient bandwidth and fast transmision requirements to many Internet users using SDN capability and the potential of Deep Learning. Specifically, DNN, CNN, LSTM and Deep autoencoder are used to build elephant detection models that achieve an average accuracy of 99.12%, 98.17%, and 98.78%, respectively. Deep autoencoder is also one of the promising algorithms that does not require human class labeler. It achieves an accuracy of 97.95% with a loss of 0.13 . Since the loss value is closer to zero, the performance of the model is good. Therefore, the study has a great importance to Internet service providers, Internet subscribers, as well as for future researchers in this area.
Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)
Authors: Maryan Rizinski, Hristijan Peshov, Kostadin Mishev, Milos Jovanovik, Dimitar Trajanov
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.03997
Pdf link: https://arxiv.org/pdf/2306.03997
Abstract Lexicon-based sentiment analysis (SA) in finance leverages specialized, manually annotated lexicons created by human experts to extract sentiment from financial texts. Although lexicon-based methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various NLP tasks due to their remarkable performance. However, transformers require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon, reducing the human involvement in annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in SA of financial datasets. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the XLex approach is inherently more interpretable than transformer models as lexicon models rely on predefined rules, allowing for better insights into the results of SA and making the XLex approach a viable tool for financial decision-making.
A Novel Implementation Methodology for Error Correction Codes on a Neuromorphic Architecture
Authors: Sahil Hassan, Parker Dattilo, Ali Akoglu
Subjects: Neural and Evolutionary Computing (cs.NE); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2306.04010
Pdf link: https://arxiv.org/pdf/2306.04010
Abstract The Internet of Things infrastructure connects a massive number of edge devices with an increasing demand for intelligent sensing and inferencing capability. Such data-sensitive functions necessitate energy-efficient and programmable implementations of Error Correction Codes (ECC) and decoders. The algorithmic flow of ECCs with concurrent accumulation and comparison types of operations are innately exploitable by neuromorphic architectures for energy efficient execution -- an area that is relatively unexplored outside of machine learning applications. For the first time, we propose a methodology to map the hard-decision class of decoder algorithms on a neuromorphic architecture. We present the implementation of the Gallager B (GaB) decoding algorithm on a TrueNorth-inspired architecture that is emulated on the Xilinx Zynq ZCU102 MPSoC. Over this reference implementation, we propose architectural modifications at the neuron block level that result in a reduction of energy consumption by 31% with a negligible increase in resource usage while achieving the same error correction performance.
Revisiting Neural Retrieval on Accelerators
Authors: Jiaqi Zhai, Zhaojie Gong, Yueming Wang, Xiao Sun, Zheng Yan, Fu Li, Xing Liu
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.04039
Pdf link: https://arxiv.org/pdf/2306.04039
Abstract Retrieval finds a small number of relevant candidates from a large corpus for information retrieval and recommendation applications. A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings. This formulation permits efficient inference, commonly known as Maximum Inner Product Search (MIPS). Despite its popularity, dot products cannot capture complex user-item interactions, which are multifaceted and likely high rank. We hence examine non-dot-product retrieval settings on accelerators, and propose \textit{mixture of logits} (MoL), which models (user, item) similarity as an adaptive composition of elementary similarity functions. This new formulation is expressive, capable of modeling high rank (user, item) interactions, and further generalizes to the long tail. When combined with a hierarchical retrieval strategy, \textit{h-indexer}, we are able to scale up MoL to 100M corpus on a single GPU with latency comparable to MIPS baselines. On public datasets, our approach leads to uplifts of up to 77.3\% in hit rate (HR). Experiments on a large recommendation surface at Meta showed strong metric gains and reduced popularity bias, validating the proposed approach's performance and improved generalization.
Active Sparse Conversations for Improved Audio-Visual Embodied Navigation
Authors: Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04047
Pdf link: https://arxiv.org/pdf/2306.04047
Abstract Efficient navigation towards an audio-goal necessitates an embodied agent to not only possess the ability to use audio-visual cues effectively, but also be equipped to actively (but occasionally) seek human/oracle assistance without sacrificing autonomy, e.g., when it is uncertain of where to navigate towards locating a noisy or sporadic audio goal. To this end, we present CAVEN -- a conversational audio-visual embodied navigation agent that is capable of posing navigation questions to a human/oracle and processing the oracle responses; both in free-form natural language. At the core of CAVEN is a multimodal hierarchical reinforcement learning (RL) setup that is equipped with a high-level policy that is trained to choose from one of three low-level policies (at every step), namely: (i) to navigate using audio-visual cues, or (ii) to frame a question to the oracle and receive a short or detailed response, or (iii) ask generic questions (when unsure of what to ask) and receive instructions. Key to generating the agent's questions is our novel TrajectoryNet that forecasts the most likely next steps to the goal and a QuestionNet that uses these steps to produce a question. All the policies are learned end-to-end via the RL setup, with penalties to enforce sparsity in receiving navigation instructions from the oracle. To evaluate the performance of CAVEN, we present extensive experiments on the SoundSpaces framework for the task of semantic audio-visual navigation. Our results show that CAVEN achieves upto 12% gain in performance over competing methods, especially in localizing new sound sources, even in the presence of auditory distractions.
Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health
Authors: Chandreen Liyanage, Muskan Garg, Vijay Mago, Sunghwan Sohn
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2306.04059
Pdf link: https://arxiv.org/pdf/2306.04059
Abstract Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the pre-screening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Authors: Zhongbin Xie, Thomas Lukasiewicz
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.04067
Pdf link: https://arxiv.org/pdf/2306.04067
Abstract The increasingly large size of modern pretrained language models not only makes them inherit more human-like biases from the training corpora, but also makes it computationally expensive to mitigate such biases. In this paper, we investigate recent parameter-efficient methods in combination with counterfactual data augmentation (CDA) for bias mitigation. We conduct extensive experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance and abilities to preserve the internal knowledge of a pre-trained model. We find that the parameter-efficient methods (i) are effective in mitigating gender bias, where adapter tuning is consistently the most effective one and prompt tuning is more suitable for GPT-2 than BERT, (ii) are less effective when it comes to racial and religious bias, which may be attributed to the limitations of CDA, and (iii) can perform similarly to or sometimes better than full fine-tuning with improved time and memory efficiency, as well as maintain the internal knowledge in BERT and GPT-2, evaluated via fact retrieval and downstream fine-tuning.
Blockchain Technology in Higher Education Ecosystem: Unraveling the Good, Bad, and Ugly
Authors: Sharaban Tahora, Bilash Saha, Nazmus Sakib, Hossain Shahriar, Hisham Haddad
Subjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2306.04071
Pdf link: https://arxiv.org/pdf/2306.04071
Abstract The higher education management systems first identified and realized the trap of pitting innovation against privacy while first addressing COVID-19 social isolation challenges in 2020. In the age of data sprawl, we observe the situation has been exacerbating since then. Integrating blockchain technology has the potential to address the recent and emerging challenges in the higher education management system. This paper unravels the Good (scopes and benefits), Bad (limitations), and Ugly (challenges and trade-offs) of blockchain technology integration in the higher education management paradigm in the existing landscape. Our study adopts both qualitative and quantitative approaches to explore the experiences of educators, researchers, students, and other stakeholders and fully understand the blockchain's potential and contextual challenges. Our findings will envision an efficient, secure, and transparent higher education management system and help shape the debate (and trade-offs) pertaining to the recent shift in relevant business and management climate and regulatory sentiment.
Professional Basketball Player Behavior Synthesis via Planning with Diffusion
Authors: Xiusi Chen, Wei-Yao Wang, Ziniu Hu, Curtis Chou, Lam Hoang, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2306.04090
Pdf link: https://arxiv.org/pdf/2306.04090
Abstract Dynamically planning in multi-agent systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing both concealed strategic policies and decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes makes it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we first formulate the sequential decision-making process as a conditional trajectory generation process. We further introduce PLAYBEST (PLAYer BEhavior SynThesis), a method for enhancing player decision-making. We extend the state-of-the-art generative model, diffusion probabilistic model, to learn challenging multi-agent environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained using the play-by-play data with corresponding rewards acting as the plan guidance. To accomplish reward-guided trajectory generation, conditional sampling is introduced to condition the diffusion model on the value function and conduct classifier-guided sampling. We validate the effectiveness of PLAYBEST via comprehensive simulation studies from real-world data, contrasting the generated trajectories and play strategies with those employed by professional basketball teams. Our results reveal that the model excels at generating high-quality basketball trajectories that yield efficient plays, surpassing conventional planning techniques in terms of adaptability, flexibility, and overall performance. Moreover, the synthesized play strategies exhibit a remarkable alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
A novel deeponet model for learning moving-solution operators with applications to earthquake hypocenter localization
Authors: Ehsan Haghighat, Umair bin Waheed, George Karniadakis
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2306.04096
Pdf link: https://arxiv.org/pdf/2306.04096
Abstract Seismicity induced by human activities poses a significant threat to public safety, emphasizing the need for accurate and timely earthquake hypocenter localization. In this study, we introduce X-DeepONet, a novel variant of deep operator networks (DeepONets), for learning moving-solution operators of parametric partial differential equations (PDEs), with application to real-time earthquake localization. Leveraging the power of neural operators, X-DeepONet learns to estimate traveltime fields associated with earthquake sources by incorporating information from seismic arrival times and velocity models. Similar to the DeepONet, X-DeepONet includes a trunk net and a branch net. Additionally, we introduce a root network that not only takes the standard DeepONet's multiplication operator as input, it also takes addition and subtraction operators. We show that for problems with moving fields, the standard multiplication operation of DeepONet is insufficient to capture field relocation, while addition and subtraction operators along with the eXtended root significantly improve its accuracy both under data-driven (supervised) and physics-informed (unsupervised) training. We demonstrate the effectiveness of X-DeepONet through various experiments, including scenarios with variable velocity models and arrival times. The results show remarkable accuracy in earthquake localization, even for heterogeneous and complex velocity models. The proposed framework also exhibits excellent generalization capabilities and robustness against noisy arrival times. The method provides a computationally efficient approach for quantifying uncertainty in hypocenter locations resulting from traveltime pick errors and velocity model variations. Our results underscore X-DeepONet's potential to improve seismic monitoring systems, aiding the development of early warning systems for seismic hazard mitigation.
Quasi-Newton Updating for Large-Scale Distributed Learning
Authors: Wu Shuyuan, Huang Danyang, Wang Hansheng
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2306.04111
Pdf link: https://arxiv.org/pdf/2306.04111
Abstract Distributed computing is critically important for modern statistical analysis. Herein, we develop a distributed quasi-Newton (DQN) framework with excellent statistical, computation, and communication efficiency. In the DQN method, no Hessian matrix inversion or communication is needed. This considerably reduces the computation and communication complexity of the proposed method. Notably, related existing methods only analyze numerical convergence and require a diverging number of iterations to converge. However, we investigate the statistical properties of the DQN method and theoretically demonstrate that the resulting estimator is statistically efficient over a small number of iterations under mild conditions. Extensive numerical analyses demonstrate the finite sample performance.
MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation
Authors: Tony Tohme, Mohsen Sadr, Kamal Youcef-Toumi, Nicolas G. Hadjiconstantinou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04120
Pdf link: https://arxiv.org/pdf/2306.04120
Abstract We introduce MESSY estimation, a Maximum-Entropy based Stochastic and Symbolic densitY estimation method. The proposed approach recovers probability density functions symbolically from samples using moments of a Gradient flow in which the ansatz serves as the driving force. In particular, we construct a gradient-based drift-diffusion process that connects samples of the unknown distribution function to a guess symbolic expression. We then show that when the guess distribution has the maximum entropy form, the parameters of this distribution can be found efficiently by solving a linear system of equations constructed using the moments of the provided samples. Furthermore, we use Symbolic regression to explore the space of smooth functions and find optimal basis functions for the exponent of the maximum entropy functional leading to good conditioning. The cost of the proposed method in each iteration of the random search is linear with the number of samples and quadratic with the number of basis functions. We validate the proposed MESSY estimation method against other benchmark methods for the case of a bi-modal and a discontinuous density, as well as a density at the limit of physical realizability. We find that the addition of a symbolic search for basis functions improves the accuracy of the estimation at a reasonable additional computational cost. Our results suggest that the proposed method outperforms existing density recovery methods in the limit of a small to moderate number of samples by providing a low-bias and tractable symbolic description of the unknown density at a reasonable computational cost.
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification
Authors: Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2306.04125
Pdf link: https://arxiv.org/pdf/2306.04125
Abstract Multimodal fusion of multiple heterogeneous and interconnected signals is a fundamental challenge in almost all multimodal problems and applications. In order to perform multimodal fusion, we need to understand the types of interactions that modalities can exhibit: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how human annotators can be leveraged to annotate two categorizations of multimodal interactions: (1) partial labels, where different randomly assigned annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator is tasked to annotate the label given the first modality before giving them the second modality and asking them to explicitly reason about how their answer changes, before proposing an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions on the task, uniqueness: the extent to which one modality enables a task prediction that the other does not, and synergy: the extent to which only both modalities enable one to make a prediction about the task that one would not otherwise make using either modality individually. Through extensive experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying interactions in multimodal datasets.
Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility
Authors: Chanyoung Park, Gyu Seon Kim, Soohyun Park, Soyi Jung, Joongheon Kim
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.04137
Pdf link: https://arxiv.org/pdf/2306.04137
Abstract The development of urban-air-mobility (UAM) is rapidly progressing with spurs, and the demand for efficient transportation management systems is a rising need due to the multifaceted environmental uncertainties. Thus, this paper proposes a novel air transportation service management algorithm based on multi-agent deep reinforcement learning (MADRL) to address the challenges of multi-UAM cooperation. Specifically, the proposed algorithm in this paper is based on communication network (CommNet) method utilizing centralized training and distributed execution (CTDE) in multiple UAMs for providing efficient air transportation services to passengers collaboratively. Furthermore, this paper adopts actual vertiport maps and UAM specifications for constructing realistic air transportation networks. By evaluating the performance of the proposed algorithm in data-intensive simulations, the results show that the proposed algorithm outperforms existing approaches in terms of air transportation service quality. Furthermore, there are no inferior UAMs by utilizing parameter sharing in CommNet and a centralized critic network in CTDE. Therefore, it can be confirmed that the research results in this paper can provide a promising solution for autonomous air transportation management systems in city-wide urban areas.
CFDP: Common Frequency Domain Pruning
Authors: Samir Khaki, Weihan Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04147
Pdf link: https://arxiv.org/pdf/2306.04147
Abstract As the saying goes, sometimes less is more -- and when it comes to neural networks, that couldn't be more true. Enter pruning, the art of selectively trimming away unnecessary parts of a network to create a more streamlined, efficient architecture. In this paper, we introduce a novel end-to-end pipeline for model pruning via the frequency domain. This work aims to shed light on the interoperability of intermediate model outputs and their significance beyond the spatial domain. Our method, dubbed Common Frequency Domain Pruning (CFDP) aims to extrapolate common frequency characteristics defined over the feature maps to rank the individual channels of a layer based on their level of importance in learning the representation. By harnessing the power of CFDP, we have achieved state-of-the-art results on CIFAR-10 with GoogLeNet reaching an accuracy of 95.25%, that is, +0.2% from the original model. We also outperform all benchmarks and match the original model's performance on ImageNet, using only 55% of the trainable parameters and 60% of the FLOPs. In addition to notable performances, models produced via CFDP exhibit robustness to a variety of configurations including pruning from untrained neural architectures, and resistance to adversarial attacks. The implementation code can be found at https://github.com/Skhaki18/CFDP.
SANGEET: A XML based Open Dataset for Research in Hindustani Sangeet
Authors: Chandan Misra, Swarup Chattopadhyay
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.04148
Pdf link: https://arxiv.org/pdf/2306.04148
Abstract It is very important to access a rich music dataset that is useful in a wide variety of applications. Currently, available datasets are mostly focused on storing vocal or instrumental recording data and ignoring the requirement of its visual representation and retrieval. This paper attempts to build an XML-based public dataset, called SANGEET, that stores comprehensive information of Hindustani Sangeet (North Indian Classical Music) compositions written by famous musicologist Pt. Vishnu Narayan Bhatkhande. SANGEET preserves all the required information of any given composition including metadata, structural, notational, rhythmic, and melodic information in a standardized way for easy and efficient storage and extraction of musical information. The dataset is intended to provide the ground truth information for music information research tasks, thereby supporting several data-driven analysis from a machine learning perspective. We present the usefulness of the dataset by demonstrating its application on music information retrieval using XQuery, visualization through Omenad rendering system. Finally, we propose approaches to transform the dataset for performing statistical and machine learning tasks for a better understanding of Hindustani Sangeet. The dataset can be found at https://github.com/cmisra/Sangeet.
A Unified One-Step Solution for Aspect Sentiment Quad Prediction
Authors: Junxian Zhou, Haiqin Yang, Yuxuan He, Hao Mou, Junbo Yang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04152
Pdf link: https://arxiv.org/pdf/2306.04152
Abstract Aspect sentiment quad prediction (ASQP) is a challenging yet significant subtask in aspect-based sentiment analysis as it provides a complete aspect-level sentiment structure. However, existing ASQP datasets are usually small and low-density, hindering technical advancement. To expand the capacity, in this paper, we release two new datasets for ASQP, which contain the following characteristics: larger size, more words per sample, and higher density. With such datasets, we unveil the shortcomings of existing strong ASQP baselines and therefore propose a unified one-step solution for ASQP, namely One-ASQP, to detect the aspect categories and to identify the aspect-opinion-sentiment (AOS) triplets simultaneously. Our One-ASQP holds several unique advantages: (1) by separating ASQP into two subtasks and solving them independently and simultaneously, we can avoid error propagation in pipeline-based methods and overcome slow training and inference in generation-based methods; (2) by introducing sentiment-specific horns tagging schema in a token-pair-based two-dimensional matrix, we can exploit deeper interactions between sentiment elements and efficiently decode the AOS triplets; (3) we design ``[NULL]'' token can help us effectively identify the implicit aspects or opinions. Experiments on two benchmark datasets and our released two datasets demonstrate the advantages of our One-ASQP. The two new datasets are publicly released at \url{https://www.github.com/Datastory-CN/ASQP-Datasets}.
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Authors: Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.04169
Pdf link: https://arxiv.org/pdf/2306.04169
Abstract Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a weight matrix $W \in \mathbb{R}_{\geq 0}^{n \times n}$, a parameter $k$, the goal is to output two matrices $U, V \in \mathbb{R}^{n \times k}$ such that $| W \circ (M - U V) |_F$ is minimized, where $\circ$ denotes the Hadamard product. Such a problem is known to be NP-hard and even hard to approximate [RSW16]. Meanwhile, alternating minimization is a good heuristic solution for approximating weighted low rank approximation. The work [LLR16] shows that, under mild assumptions, alternating minimization does provide provable guarantees. In this work, we develop an efficient and robust framework for alternating minimization. For weighted low rank approximation, this improves the runtime of [LLR16] from $n^2 k^2$ to $n^2k$. At the heart of our work framework is a high-accuracy multiple response regression solver together with a robust analysis of alternating minimization.
Synthesising Recursive Functions for First-Order Model Counting: Challenges, Progress, and Conjectures
Authors: Paulius Dilkas, Vaishak Belle
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04189
Pdf link: https://arxiv.org/pdf/2306.04189
Abstract First-order model counting (FOMC) is a computational problem that asks to count the models of a sentence in finite-domain first-order logic. In this paper, we argue that the capabilities of FOMC algorithms to date are limited by their inability to express many types of recursive computations. To enable such computations, we relax the restrictions that typically accompany domain recursion and generalise the circuits used to express a solution to an FOMC problem to directed graphs that may contain cycles. To this end, we adapt the most well-established (weighted) FOMC algorithm ForcLift to work with such graphs and introduce new compilation rules that can create cycle-inducing edges that encode recursive function calls. These improvements allow the algorithm to find efficient solutions to counting problems that were previously beyond its reach, including those that cannot be solved efficiently by any other exact FOMC algorithm. We end with a few conjectures on what classes of instances could be domain-liftable as a result.
An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders
Authors: Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.04190
Pdf link: https://arxiv.org/pdf/2306.04190
Abstract The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children's speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen's Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.
Extracting Cloud-based Model with Prior Knowledge
Authors: Shiqian Zhao, Kangjie Chen, Meng Hao, Jian Zhang, Guowen Xu, Hongwei Li, Tianwei Zhang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.04192
Pdf link: https://arxiv.org/pdf/2306.04192
Abstract Machine Learning-as-a-Service, a pay-as-you-go business pattern, is widely accepted by third-party users and developers. However, the open inference APIs may be utilized by malicious customers to conduct model extraction attacks, i.e., attackers can replicate a cloud-based black-box model merely via querying malicious examples. Existing model extraction attacks mainly depend on the posterior knowledge (i.e., predictions of query samples) from Oracle. Thus, they either require high query overhead to simulate the decision boundary, or suffer from generalization errors and overfitting problems due to query budget limitations. To mitigate it, this work proposes an efficient model extraction attack based on prior knowledge for the first time. The insight is that prior knowledge of unlabeled proxy datasets is conducive to the search for the decision boundary (e.g., informative samples). Specifically, we leverage self-supervised learning including autoencoder and contrastive learning to pre-compile the prior knowledge of the proxy dataset into the feature extractor of the substitute model. Then we adopt entropy to measure and sample the most informative examples to query the target model. Our design leverages both prior and posterior knowledge to extract the model and thus eliminates generalizability errors and overfitting problems. We conduct extensive experiments on open APIs like Traffic Recognition, Flower Recognition, Moderation Recognition, and NSFW Recognition from real-world platforms, Azure and Clarifai. The experimental results demonstrate the effectiveness and efficiency of our attack. For example, our attack achieves 95.1% fidelity with merely 1.8K queries (cost 2.16$) on the NSFW Recognition API. Also, the adversarial examples generated with our substitute model have better transferability than others, which reveals that our scheme is more conducive to downstream attacks.
High-Performance Caching of Homomorphic Encryption for Cloud Databases
Authors: Dongfang Zhao
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2306.04227
Pdf link: https://arxiv.org/pdf/2306.04227
Abstract While homomorphic encryption (HE) has garnered significant research interest in cloud-based outsourced databases due to its algebraic properties over ciphertexts, the computational overhead associated with HE has hindered its widespread adoption in production database systems. Recently, a caching technique called Radix-based additive caching of homomorphic encryption (Rache) was proposed in SIGMOD'23. The primary objective of this paper is to address the performance overhead resulting from the expensive randomization process in Rache. To achieve this, we propose a novel encryption algorithm called $ASEnc$, which replaces the computationally intensive full scan of radixes with the caching of a polynomial number of radix-powers during an offline stage. This design significantly reduces the performance impact caused by randomization. Furthermore, this paper aims to extend Rache's capabilities to support floating-point numbers. To accomplish this, we introduce a new encryption algorithm named $FSEnc$, leveraging efficient constant multiplication available in state-of-the-art fully homomorphic encryption (FHE) schemes. Notably, $FSEnc$ offers the flexibility to cache the coefficients instead of the radixes themselves, which may result in a large number of cached ciphertexts. However, we manage this efficiently by streaming the dynamically cached ciphertexts through a vector of circular buffers. We demonstrate that both encryption algorithms guarantee semantic security (IND-CPA). To validate their performance, we implement both algorithms as loadable functions in MySQL 8.0 and deploy the system prototype on a 96-core server hosted in the Chameleon Cloud. Experimental results showcase that $ASEnc$ outperforms Rache by 2.3--3.3$\times$, while $FSEnc$ surpasses the state-of-the-art floating-point FHE CKKS by 1.8--5.6$\times$.
Self-Adjusting Weighted Expected Improvement for Bayesian Optimization
Authors: Carolin Benjamins, Elena Raponi, Anja Jankovic, Carola Doerr, Marius Lindauer
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04262
Pdf link: https://arxiv.org/pdf/2306.04262
Abstract Bayesian Optimization (BO) is a class of surrogate-based, sample-efficient algorithms for optimizing black-box problems with small evaluation budgets. The BO pipeline itself is highly configurable with many different design choices regarding the initial design, surrogate model, and acquisition function (AF). Unfortunately, our understanding of how to select suitable components for a problem at hand is very limited. In this work, we focus on the definition of the AF, whose main purpose is to balance the trade-off between exploring regions with high uncertainty and those with high promise for good solutions. We propose Self-Adjusting Weighted Expected Improvement (SAWEI), where we let the exploration-exploitation trade-off self-adjust in a data-driven manner, based on a convergence criterion for BO. On the noise-free black-box BBOB functions of the COCO benchmarking platform, our method exhibits a favorable any-time performance compared to handcrafted baselines and serves as a robust default choice for any problem structure. The suitability of our method also transfers to HPOBench. With SAWEI, we are a step closer to on-the-fly, data-driven, and robust BO designs that automatically adjust their sampling behavior to the problem at hand.
On Isolating Roots in a Multiple Field Extension
Authors: Christina Katsamaki (SU, OURAGAN), Fabrice Rouillier (SU, OURAGAN)
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2306.04271
Pdf link: https://arxiv.org/pdf/2306.04271
Abstract We address univariate root isolation when the polynomial's coefficients are in a multiple field extension. We consider a polynomial $F \in L[Y]$, where $L$ is a multiple algebraic extension of $\mathbb{Q}$. We provide aggregate bounds for $F$ and algorithmic and bit-complexity results for the problem of isolating its roots. For the latter problem we follow a common approach based on univariate root isolation algorithms. For the particular case where $F$ does not have multiple roots, we achieve a bit-complexity in $\tilde{\mathcal{O}}_B(n d^{2n+2}(d+n\tau))$, where $d$ is the total degree and $\tau$ is the bitsize of the involved polynomials.In the general case we need to enhance our algorithm with a preprocessing step that determines the number of distinct roots of $F$. We follow a numerical, yet certified, approach that has bit-complexity $\tilde{\mathcal{O}}_B(n^2d^{3n+3}\tau + n^3 d^{2n+4}\tau)$.
HornFuzz: Fuzzing CHC solvers
Authors: Anzhela Sukhanova, Valentyn Sobol
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.04281
Pdf link: https://arxiv.org/pdf/2306.04281
Abstract Many advanced program analysis and verification methods are based on solving systems of Constrained Horn Clauses (CHC). Testing CHC solvers is very important, as correctness of their work determines whether bugs in the analyzed programs are detected or missed. One of the well-established and efficient methods of automated software testing is fuzzing: analyzing the reactions of programs to random input data. Currently, there are no fuzzers for CHC solvers, and fuzzers for SMT solvers are not efficient in CHC solver testing, since they do not consider CHC specifics. In this paper, we present HornFuzz, a mutation-based gray-box fuzzing technique for detecting bugs in CHC solvers based on the idea of metamorphic testing. We evaluated our fuzzer on one of the highest performing CHC solvers, Spacer, and found a handful of bugs in Spacer. In particular, some discovered problems are so serious that they require fixes with significant changes to the solver.
Revising deep learning methods in parking lot occupancy detection
Authors: Anastasia Martynova, Mikhail Kuznetsov, Vadim Porvatov, Vladislav Tishin, Andrey Kuznetsov, Natalia Semenova, Ksenia Kuznetsova
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04288
Pdf link: https://arxiv.org/pdf/2306.04288
Abstract Parking guidance systems have recently become a popular trend as a part of the smart cities' paradigm of development. The crucial part of such systems is the algorithm allowing drivers to search for available parking lots across regions of interest. The classic approach to this task is based on the application of neural network classifiers to camera records. However, existing systems demonstrate a lack of generalization ability and appropriate testing regarding specific visual conditions. In this study, we extensively evaluate state-of-the-art parking lot occupancy detection algorithms, compare their prediction quality with the recently emerged vision transformers, and propose a new pipeline based on EfficientNet architecture. Performed computational experiments have demonstrated the performance increase in the case of our model, which was evaluated on 5 different datasets.
Introduction and Assessment of the Addition of Links and Containers to the Blackboard Architecture
Authors: Jordan Milbrath, Jeremy Straub
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04289
Pdf link: https://arxiv.org/pdf/2306.04289
Abstract The Blackboard Architecture provides a mechanism for storing data and logic and using it to make decisions that impact the application environment that the Blackboard Architecture network models. While rule-fact-action networks can represent numerous types of data, the relationships that can be easily modeled are limited by the propositional logic nature of the rule-fact network structure. This paper proposes and evaluates the inclusion of containers and links in the Blackboard Architecture. These objects are designed to allow them to model organizational, physical, spatial and other relationships that cannot be readily or efficiently implemented as Boolean logic rules. Containers group related facts together and can be nested to implement complex relationships. Links interconnect containers that have a relationship that is relevant to their organizational purpose. Both objects, together, facilitate new ways of using the Blackboard Architecture and enable or simply its use for complex tasks that have multiple types of relationships that need to be considered during operations.
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Authors: Boyuan Sun, Yuqi Yang, Le Zhang, Ming-Ming Cheng, Qibin Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04300
Pdf link: https://arxiv.org/pdf/2306.04300
Abstract In this paper, we present a simple but performant semi-supervised semantic segmentation approach, termed CorrMatch. Our goal is to mine more high-quality regions from the unlabeled images to leverage the unlabeled data more efficiently via consistency regularization. The key contributions of our CorrMatch are two novel and complementary strategies. First, we introduce an adaptive threshold updating strategy with a relaxed initialization to expand the high-quality regions. Furthermore, we propose to propagate high-confidence predictions through measuring the pairwise similarities between pixels. Despite its simplicity, we show that CorrMatch achieves great performance on popular semi-supervised semantic segmentation benchmarks. Taking the DeepLabV3+ framework with ResNet-101 backbone as our segmentation model, we receive a 76%+ mIoU score on the Pascal VOC 2012 segmentation benchmark with only 92 annotated images provided. We also achieve a consistent improvement over previous semi-supervised semantic segmentation models. Code will be made publicly available.
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Authors: Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.04301
Pdf link: https://arxiv.org/pdf/2306.04301
Abstract With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-spectrograms. Specifically, a two-stage and a one-stage system are designed respectively, to improve the audio quality and the performance of style transfer. Secondly, a diffusion bridge of quantized VAE is designed to efficiently learn complex discrete style representations and improve the performance of style transfer. To have a better ability of style transfer, we introduce ControlVAE to improve the reconstruction quality and have good interpretability simultaneously. Experiments on LibriTTS dataset demonstrate that our method is more effective than baseline models.
Self-Resolving Prediction Markets for Unverifiable Outcomes
Authors: Siddarth Srinivasan, Ezra Karger, Yiling Chen
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Arxiv link: https://arxiv.org/abs/2306.04305
Pdf link: https://arxiv.org/pdf/2306.04305
Abstract Prediction markets elicit and aggregate beliefs by paying agents based on how close their predictions are to a verifiable future outcome. However, outcomes of many important questions are difficult to verify or unverifiable, in that the ground truth may be hard or impossible to access. Examples include questions about causal effects where it is infeasible or unethical to run randomized trials; crowdsourcing and content moderation tasks where it is prohibitively expensive to verify ground truth; and questions asked over long time horizons, where the delay until the realization of the outcome skews agents' incentives to report their true beliefs. We present a novel and unintuitive result showing that it is possible to run an $\varepsilon-$incentive compatible prediction market to elicit and efficiently aggregate information from a pool of agents without observing the outcome by paying agents the negative cross-entropy between their prediction and that of a carefully chosen reference agent. Our key insight is that a reference agent with access to more information can serve as a reasonable proxy for the ground truth. We use this insight to propose self-resolving prediction markets that terminate with some probability after every report and pay all but a few agents based on the final prediction. We show that it is an $\varepsilon-$Perfect Bayesian Equilibrium for all agents to report truthfully in our mechanism and to believe that all other agents report truthfully. Although primarily of interest for unverifiable outcomes, this design is also applicable for verifiable outcomes.
Compressed Sensing Based Channel Estimation for Movable Antenna Communications
Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.04333
Pdf link: https://arxiv.org/pdf/2306.04333
Abstract In this letter, we study the channel estimation for wireless communications with movable antenna (MA), which requires to reconstruct the channel response at any location in a given region where the transmitter/receiver is located based on the channel measurements taken at finite locations therein, so as to find the MA's location for optimizing the communication performance. To reduce the pilot overhead and computational complexity for channel estimation, we propose a new successive transmitter-receiver compressed sensing (STRCS) method by exploiting the efficient representation of the channel responses in the given transmitter/receiver region (field) in terms of multi-path components. Specifically, the field-response information (FRI) in the angular domain, including the angles of departure (AoDs)/angles of arrival (AoAs) and complex coefficients of all significant multi-path components are sequentially estimated based on a finite number of channel measurements taken at random/selected locations by the MA at the transmitter and/or receiver. Simulation results demonstrate that the proposed channel reconstruction method outperforms the benchmark schemes in terms of both pilot overhead and channel reconstruction accuracy.
Estimating nested expectations without inner conditional sampling and application to value of information analysis
Authors: Tomohiko Hironaka, Takashi Goda
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.04363
Pdf link: https://arxiv.org/pdf/2306.04363
Abstract Motivated by various computational applications, we investigate the problem of estimating nested expectations. Building upon recent work by the authors, we propose a novel Monte Carlo estimator for nested expectations, inspired by sparse grid quadrature, that does not require sampling from inner conditional distributions. Theoretical analysis establishes an upper bound on the mean squared error of our estimator under mild assumptions on the problem, demonstrating its efficiency for cases with low-dimensional outer variables. We illustrate the effectiveness of our estimator through its application to problems related to value of information analysis, with moderate dimensionality. Overall, our method presents a promising approach to efficiently estimate nested expectations in practical computational settings.
Efficient Recruitment Strategy for Collaborative Mobile Crowd Sensing Based on GCN Trustworthiness Prediction
Authors: Zhongwei Zhan, Yingjie Wang, Peiyong Duan, Akshita Maradapu Vera Venkata Sai, Zhaowei Liu, Chaocan Xiang, Xiangrong Tong, Weilong Wang, Zhipeng Cai
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04366
Pdf link: https://arxiv.org/pdf/2306.04366
Abstract Collaborative Mobile Crowd Sensing (CMCS) enhances data quality and coverage by promoting teamwork in task sensing, with worker recruitment representing a complex multi-objective optimization problem. Existing strategies mainly focus on the characteristics of workers themselves, neglecting the asymmetric trust relationships between them, which affects the rationality of task utility evaluation. To address this, this paper first employs the Mini-Batch K-Means clustering algorithm and deploys edge servers to enable efficient distributed worker recruitment. Historical data and task requirements are utilized to obtain workers' ability types and distances. A trust-directed graph in the worker's social network is input into the Graph Convolutional Network (GCN) framework for training, capturing asymmetric trustworthiness between worker pairs. Privacy leakage is prevented in CMCS scenarios through high trust values between workers. Ultimately, an undirected recruitment graph is constructed using workers' abilities, trust values, and distance weights, transforming the worker recruitment problem into a Maximum Weight Average Subgraph Problem (MWASP). A Tabu Search Recruitment (TSR) algorithm is proposed to rationally recruit a balanced multi-objective optimal task utility worker set for each task. Extensive simulation experiments on four real-world datasets demonstrate the effectiveness of the proposed strategy, outperforming other strategies.
Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning
Authors: Suyuan Zhao, Jiahuan Zhang, Zaiqing Nie
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2306.04371
Pdf link: https://arxiv.org/pdf/2306.04371
Abstract Single-cell RNA sequencing (scRNA-seq) data is a potent tool for comprehending the "language of life" and can provide insights into various downstream biomedical tasks. Large-scale language models (LLMs) are starting to be used for cell representation learning. However, current LLM-based cell representation learning methods depend solely on the BERT architecture, causing an anisotropic embedding space that leads to inefficient semantic representation. Contrastive learning alleviates this problem by distributing the embeddings uniformly. As a larger batch size in contrastive learning results in better representation, the practical application of contrastive learning in cell representation learning is hampered by the high dimensionality of scRNA-seq data and the large parameter volume of LLMs. To address the batch size limitation, we propose a novel divide-and-conquer contrastive learning approach to decouple the batch size from the GPU memory size for cell representation learning. Based on our divide-and-conquer contrastive learning approach, we introduce Single-Cell Language Model CellLM, a large-scale cell representation learning model to handle high-dimensional scRNA-seq data with tens of thousands of genes. CellLM has over 50 million parameters trained with 2 million scRNA-seq data and makes the first attempt to learn cell language models from both normal cells and cancer cells. CellLM achieves new state-of-the-art (SOTA) results in all evaluated downstream tasks: including a 71.8 F_1-score for cell type annotation (a 3.0% absolute improvement over scBERT), an average F_1-score of 88.9 for single-cell drug sensitivity prediction in a few-shot scenario (an 8.3% absolute improvement), and a 93.4 Pearson's correlation for single-omics cell line drug sensitivity prediction (a 6.2% absolute improvement).
Get More for Less in Decentralized Learning Systems
Authors: Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic, Jeffrey Wigger
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04377
Pdf link: https://arxiv.org/pdf/2306.04377
Abstract Decentralized learning (DL) systems have been gaining popularity because they avoid raw data sharing by communicating only model parameters, hence preserving data confidentiality. However, the large size of deep neural networks poses a significant challenge for decentralized training, since each node needs to exchange gigabytes of data, overloading the network. In this paper, we address this challenge with JWINS, a communication-efficient and fully decentralized learning system that shares only a subset of parameters through sparsification. JWINS uses wavelet transform to limit the information loss due to sparsification and a randomized communication cut-off that reduces communication usage without damaging the performance of trained models. We demonstrate empirically with 96 DL nodes on non-IID datasets that JWINS can achieve similar accuracies to full-sharing DL while sending up to 64% fewer bytes. Additionally, on low communication budgets, JWINS outperforms the state-of-the-art communication-efficient DL algorithm CHOCO-SGD by up to 4x in terms of network savings and time.
SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory
Authors: Han Sun, Rui Gong, Konrad Schindler, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04385
Pdf link: https://arxiv.org/pdf/2306.04385
Abstract Domain adaptive object detection aims to leverage the knowledge learned from a labeled source domain to improve the performance on an unlabeled target domain. Prior works typically require the access to the source domain data for adaptation, and the availability of sufficient data on the target domain. However, these assumptions may not hold due to data privacy and rare data collection. In this paper, we propose and investigate a more practical and challenging domain adaptive object detection problem under both source-free and few-shot conditions, named as SF-FSDA. To overcome this problem, we develop an efficient labeled data factory based approach. Without accessing the source domain, the data factory renders i) infinite amount of synthesized target-domain like images, under the guidance of the few-shot image samples and text description from the target domain; ii) corresponding bounding box and category annotations, only demanding minimum human effort, i.e., a few manually labeled examples. On the one hand, the synthesized images mitigate the knowledge insufficiency brought by the few-shot condition. On the other hand, compared to the popular pseudo-label technique, the generated annotations from data factory not only get rid of the reliance on the source pretrained object detection model, but also alleviate the unavoidably pseudo-label noise due to domain shift and source-free condition. The generated dataset is further utilized to adapt the source pretrained object detection model, realizing the robust object detection under SF-FSDA. The experiments on different settings showcase that our proposed approach outperforms other state-of-the-art methods on SF-FSDA problem. Our codes and models will be made publicly available.
Symplectic multirate generalized additive Runge-Kutta methods for Hamiltonian systems
Authors: Kevin Schäfers, Michael Günther, Adrian Sandu
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.04389
Pdf link: https://arxiv.org/pdf/2306.04389
Abstract Generalized additive Runge-Kutta (GARK) schemes have shown to be a suitable tool for solving ordinary differential equations with additively partitioned right-hand sides. This work combines the ideas of symplectic GARK schemes and multirate GARK schemes to solve additively partitioned Hamiltonian systems with multirate behavior more efficiently. In a general setting of non-separable Hamiltonian systems, we derive order conditions, as well as conditions for symplecticity and time-reversibility. Moreover, investigations of the special case of separable Hamiltonian systems are carried out. We show that particular partitions may introduce stability issues and point out partitions that enable an implicit-explicit integration that comes with improved stability properties. Higher-order methods based on advanced composition techniques are discussed. Numerical results for the Fermi-Pasta-Ulam problem underline the performance of the schemes.
The Noir Dataflow Platform: Efficient Data Processing without Complexity
Authors: Luca De Martini, Alessandro Margara, Gianpaolo Cugola, Marco Donadoni, Edoardo Morassutto
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.04421
Pdf link: https://arxiv.org/pdf/2306.04421
Abstract Today, data analysis drives the decision-making process in virtually every human activity. This demands for software platforms that offer simple programming abstractions to express data analysis tasks and that can execute them in an efficient and scalable way. State-of-the-art solutions range from low-level programming primitives, which give control to the developer about communication and resource usage, but require significant effort to develop and optimize new algorithms, to high-level platforms that hide most of the complexities of parallel and distributed processing, but often at the cost of reduced efficiency. To reconcile these requirements, we developed Noir, a novel distributed data processing platform written in Rust. Noir provides a high-level dataflow programming model as mainstream data processing systems. It supports static and streaming data, it enables data transformations, grouping, aggregation, iterative computations, and time-based analytics, incurring in a low overhead. This paper presents In this paper, we present the programming model and the implementation details of Noir. We evaluate it under heterogeneous workloads. We compare it with state-of-the-art solutions for data analysis and high-performance computing, as well as alternative research products, which offer different programming abstractions and implementation strategies. Noir programs are compact and easy to write: developers need not care about low-level concerns such as resource usage, data serialization, concurrency control, and communication. Noir consistently presents comparable or better performance than competing solutions, by a large margin in several scenarios. We conclude that Noir offers a good tradeoff between simplicity and performance, allowing developers to easily express complex data analysis tasks and achieve high performance and scalability.
Dual policy as self-model for planning
Authors: Jaesung Yoo, Fernanda de la Torre, Robert Guangyu Yang
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04440
Pdf link: https://arxiv.org/pdf/2306.04440
Abstract Planning is a data efficient decision-making strategy where an agent selects candidate actions by exploring possible future states. To simulate future states when there is a high-dimensional action space, the knowledge of one's decision making strategy must be used to limit the number of actions to be explored. We refer to the model used to simulate one's decisions as the agent's self-model. While self-models are implicitly used widely in conjunction with world models to plan actions, it remains unclear how self-models should be designed. Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model. In such dual-policy agents, a model-free policy and a distilled policy are used for model-free actions and planned actions, respectively. Our results on a ecologically relevant, parametric environment indicate that distilled policy network for self-model stabilizes training, has faster inference than using model-free policy, promotes better exploration, and could learn a comprehensive understanding of its own behaviors, at the cost of distilling a new network apart from the model-free policy.
Fast Optimal Locally Private Mean Estimation via Random Projections
Authors: Hilal Asi, Vitaly Feldman, Jelani Nelson, Huy L. Nguyen, Kunal Talwar
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04444
Pdf link: https://arxiv.org/pdf/2306.04444
Abstract We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm such as PrivUnitG in the lower-dimensional space. In addition, we show that, by appropriately correlating the random projection matrices across devices, we can achieve fast server run-time. We mathematically analyze the error of the algorithm in terms of properties of the random projections, and study two instantiations. Lastly, our experiments for private mean estimation and private federated learning demonstrate that our algorithms empirically obtain nearly the same utility as optimal ones while having significantly lower communication and computational cost.
Referring Expression Comprehension Using Language Adaptive Inference
Authors: Wei Su, Peihan Miao, Huanzhang Dou, Yongjian Fu, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04451
Pdf link: https://arxiv.org/pdf/2306.04451
Abstract Different from universal object detection, referring expression comprehension (REC) aims to locate specific objects referred to by natural language expressions. The expression provides high-level concepts of relevant visual and contextual patterns, which vary significantly with different expressions and account for only a few of those encoded in the REC model. This leads us to a question: do we really need the entire network with a fixed structure for various referring expressions? Ideally, given an expression, only expression-relevant components of the REC model are required. These components should be small in number as each expression only contains very few visual and contextual clues. This paper explores the adaptation between expressions and REC models for dynamic inference. Concretely, we propose a neat yet efficient framework named Language Adaptive Dynamic Subnets (LADS), which can extract language-adaptive subnets from the REC model conditioned on the referring expressions. By using the compact subnet, the inference can be more economical and efficient. Extensive experiments on RefCOCO, RefCOCO+, RefCOCOg, and Referit show that the proposed method achieves faster inference speed and higher accuracy against state-of-the-art approaches.
Training-Free Neural Active Learning with Initialization-Robustness Guarantees
Authors: Apivich Hemachandra, Zhongxiang Dai, Jasraj Singh, See-Kiong Ng, Bryan Kian Hsiang Low
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04454
Pdf link: https://arxiv.org/pdf/2306.04454
Abstract Existing neural active learning algorithms have aimed to optimize the predictive performance of neural networks (NNs) by selecting data for labelling. However, other than a good predictive performance, being robust against random parameter initializations is also a crucial requirement in safety-critical applications. To this end, we introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both (a) good predictive performances and (b) initialization robustness. Importantly, our EV-GP criterion is training-free, i.e., it does not require any training of the NN during data selection, which makes it computationally efficient. We empirically demonstrate that our EV-GP criterion is highly correlated with both initialization robustness and generalization performance, and show that it consistently outperforms baseline methods in terms of both desiderata, especially in situations with limited initial data or large batch sizes.
Maintaining the cycle structure of dynamic permutations
Authors: Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2306.04470
Pdf link: https://arxiv.org/pdf/2306.04470
Abstract We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $\pi$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $\pi$, using the position within the cycle as the key. Updating $\pi$ to $\tau\cdot\pi$, for a transposition $\tau$, takes $\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $\pi$. The FST computes any $\pi(i)$, $\pi^{-1}(i)$, $\pi^k(i)$ and $\pi^{-k}(i)$, in $\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific queries such as determining whether two elements belong to the same cycle, flip a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized time.
Fair Column Subset Selection
Authors: Antonis Matakos, Bruno Ordozgoiti, Suhas Thejaswi
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04489
Pdf link: https://arxiv.org/pdf/2306.04489
Abstract We consider the problem of fair column subset selection. In particular, we assume that two groups are present in the data, and the chosen column subset must provide a good approximation for both, relative to their respective best rank-k approximations. We show that this fair setting introduces significant challenges: in order to extend known results, one cannot do better than the trivial solution of simply picking twice as many columns as the original methods. We adopt a known approach based on deterministic leverage-score sampling, and show that merely sampling a subset of appropriate size becomes NP-hard in the presence of two groups. Whereas finding a subset of two times the desired size is trivial, we provide an efficient algorithm that achieves the same guarantees with essentially 1.5 times that size. We validate our methods through an extensive set of experiments on real-world data.
Active Reconfigurable Intelligent Surfaces for the Millimeter-Wave Frequency Band: System Design and Measurement
Authors: Hamed Radpour, Markus Hofer, Lukas Walter Mayer, Andreas Hofmann, Martin Schiefer, Thomas Zemen
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.04515
Pdf link: https://arxiv.org/pdf/2306.04515
Abstract Reconfigurable intelligent surfaces (RISs) will play a key role to establish millimeter wave (mmWave) ultra-reliable low-latency communication systems for sixth-generation (6G) applications. Currently, there are a few working prototypes of RISs operating in the mmWave frequency band and all of them are based on passive reflective elements. However, to fabricate an efficiently working RIS at mmWave frequencies, it is crucial to take care of the strong signal attenuation, reflective element losses and undesired radio frequency (RF) circuit effects. In this paper, we provide measurement campaign results for an active RIS in the mmWave frequency band as well as its analysis and system design. The obtained results demonstrate that an active RIS outperforms a RIS working in passive mode and provides a higher signal-to-noise-ratio (SNR). The active RIS consists of active reflective elements that amplify the impinging signal and reflect the signal to the desired beam direction. To obtain an efficient RIS in terms of power consumption and RIS state switch time, we design a hexagonal RIS with 37 elements working at 26 GHz. These elements are designed to work whether in passive state (binary phase shifting) or in active state (switch OFF or amplifying). We provide a comparison between the performance of a RIS working in passive and active mode using numerical simulations and empirical measurements. This comparison reveals that the active reflective intelligent surface (RIS) provides a received power that is at least 4 dB higher than that of the equivalent passive RIS. These results demonstrate the strong advantage of using active RISs for future ultra-reliable low-latency wireless communications.
Analysing the Robustness of NSGA-II under Noise
Authors: Duc-Cuong Dang, Andre Opris, Bahare Salehi, Dirk Sudholt
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2306.04525
Pdf link: https://arxiv.org/pdf/2306.04525
Abstract Runtime analysis has produced many results on the efficiency of simple evolutionary algorithms like the (1+1) EA, and its analogue called GSEMO in evolutionary multiobjective optimisation (EMO). Recently, the first runtime analyses of the famous and highly cited EMO algorithm NSGA-II have emerged, demonstrating that practical algorithms with thousands of applications can be rigorously analysed. However, these results only show that NSGA-II has the same performance guarantees as GSEMO and it is unclear how and when NSGA-II can outperform GSEMO. We study this question in noisy optimisation and consider a noise model that adds large amounts of posterior noise to all objectives with some constant probability $p$ per evaluation. We show that GSEMO fails badly on every noisy fitness function as it tends to remove large parts of the population indiscriminately. In contrast, NSGA-II is able to handle the noise efficiently on \textsc{LeadingOnesTrailingZeroes} when $p<1/2$, as the algorithm is able to preserve useful search points even in the presence of noise. We identify a phase transition at $p=1/2$ where the expected time to cover the Pareto front changes from polynomial to exponential. To our knowledge, this is the first proof that NSGA-II can outperform GSEMO and the first runtime analysis of NSGA-II in noisy optimisation.
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
Authors: Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, Colin Raffel
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.04529
Pdf link: https://arxiv.org/pdf/2306.04529
Abstract Currently, most machine learning models are trained by centralized teams and are rarely updated. In contrast, open-source software development involves the iterative development of a shared artifact through distributed collaboration using a version control system. In the interest of enabling collaborative and continual improvement of machine learning models, we introduce Git-Theta, a version control system for machine learning models. Git-Theta is an extension to Git, the most widely used version control software, that allows fine-grained tracking of changes to model parameters alongside code and other artifacts. Unlike existing version control systems that treat a model checkpoint as a blob of data, Git-Theta leverages the structure of checkpoints to support communication-efficient updates, automatic model merges, and meaningful reporting about the difference between two versions of a model. In addition, Git-Theta includes a plug-in system that enables users to easily add support for new functionality. In this paper, we introduce Git-Theta's design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified. We publicly release Git-Theta in hopes of kickstarting a new era of collaborative model development.
Top-Down Knowledge Compilation for Counting Modulo Theories
Authors: Vincent Derkinderen, Pedro Zuidberg Dos Martires, Samuel Kolb, Paolo Morettin
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04541
Pdf link: https://arxiv.org/pdf/2306.04541
Abstract Propositional model counting (#SAT) can be solved efficiently when the input formula is in deterministic decomposable negation normal form (d-DNNF). Translating an arbitrary formula into a representation that allows inference tasks, such as counting, to be performed efficiently, is called knowledge compilation. Top-down knowledge compilation is a state-of-the-art technique for solving #SAT problems that leverages the traces of exhaustive DPLL search to obtain d-DNNF representations. While knowledge compilation is well studied for propositional approaches, knowledge compilation for the (quantifier free) counting modulo theory setting (#SMT) has been studied to a much lesser degree. In this paper, we discuss compilation strategies for #SMT. We specifically advocate for a top-down compiler based on the traces of exhaustive DPLL(T) search.
Towards Decentralized Heterogeneous Multi-Robot SLAM and Target Tracking
Authors: Ofer Dagan, Tycho L. Cinquini, Luke Morrissey, Kristen Such, Nisar R. Ahmed, Christoffer Heckman
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2306.04570
Pdf link: https://arxiv.org/pdf/2306.04570
Abstract In many robotics problems, there is a significant gain in collaborative information sharing between multiple robots, for exploration, search and rescue, tracking multiple targets, or mapping large environments. One of the key implicit assumptions when solving cooperative multi-robot problems is that all robots use the same (homogeneous) underlying algorithm. However, in practice, we want to allow collaboration between robots possessing different capabilities and that therefore must rely on heterogeneous algorithms. We present a system architecture and the supporting theory, to enable collaboration in a decentralized network of robots, where each robot relies on different estimation algorithms. To develop our approach, we focus on multi-robot simultaneous localization and mapping (SLAM) with multi-target tracking. Our theoretical framework builds on our idea of exploiting the conditional independence structure inherent to many robotics applications to separate between each robot's local inference (estimation) tasks and fuse only relevant parts of their non-equal, but overlapping probability density function (pdfs). We present a new decentralized graph-based approach to the multi-robot SLAM and tracking problem. We leverage factor graphs to split between different parts of the problem for efficient data sharing between robots in the network while enabling robots to use different local sparse landmark/dense/metric-semantic SLAM algorithms.
MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding
Authors: Tan-Sang Ha, Hai Nguyen-Truong, Tuan-Anh Vu, Sai-Kit Yeung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.04593
Pdf link: https://arxiv.org/pdf/2306.04593
Abstract Building a video retrieval system that is robust and reliable, especially for the marine environment, is a challenging task due to several factors such as dealing with massive amounts of dense and repetitive data, occlusion, blurriness, low lighting conditions, and abstract queries. To address these challenges, we present MarineVRS, a novel and flexible video retrieval system designed explicitly for the marine domain. MarineVRS integrates state-of-the-art methods for visual and linguistic object representation to enable efficient and accurate search and analysis of vast volumes of underwater video data. In addition, unlike the conventional video retrieval system, which only permits users to index a collection of images or videos and search using a free-form natural language sentence, our retrieval system includes an additional Explainability module that outputs the segmentation masks of the objects that the input query referred to. This feature allows users to identify and isolate specific objects in the video footage, leading to more detailed analysis and understanding of their behavior and movements. Finally, with its adaptability, explainability, accuracy, and scalability, MarineVRS is a powerful tool for marine researchers and scientists to efficiently and accurately process vast amounts of data and gain deeper insights into the behavior and movements of marine species.
Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management -- A Systematic Literature Review
Authors: Nishant A. Parikh
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04605
Pdf link: https://arxiv.org/pdf/2306.04605
Abstract Generative Artificial Intelligence (GAI) has made outstanding strides in recent years, with a good-sized impact on software product management. Drawing on pertinent articles from 2016 to 2023, this systematic literature evaluation reveals generative AI's potential applications, benefits, and constraints in this area. The study shows that technology can assist in idea generation, market research, customer insights, product requirements engineering, and product development. It can help reduce development time and costs through automatic code generation, customer feedback analysis, and more. However, the technology's accuracy, reliability, and ethical consideration persist. Ultimately, generative AI's practical application can significantly improve software product management activities, leading to more efficient use of resources, better product outcomes, and improved end-user experiences.
Acoustic singular surfaces in an exponential class of inhomogeneous gases: A new numerical approach based on Krylov subspace spectral methodologies
Authors: Bailey Rester, James V. Lambers, Pedro M. Jordan
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.04611
Pdf link: https://arxiv.org/pdf/2306.04611
Abstract We investigate the propagation of acoustic singular surfaces, specifically, linear shock waves and nonlinear acceleration waves, in a class of inhomogeneous gases whose ambient mass density varies exponentially. Employing the mathematical tools of singular surface theory, we first determine the evolution of both the jump amplitudes and the locations/velocities of their associated wave-fronts, along with a variety of related analytical results. We then turn to what have become known as Krylov subspace spectral (KSS) methods to numerically simulate the evolution of the full waveforms under consideration. These are not only performed quite efficiently, since KSS allows the use of `large' CFL numbers, but also quite accurately, in the sense of capturing theoretically-predicted features of the solution profiles more faithfully than other time-stepping methods, since KSS customizes the computation of the components of the solution corresponding to the different frequencies involved. The presentation concludes with a listing of possible, acoustics-related, follow-on studies.
JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency
Authors: Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.04615
Pdf link: https://arxiv.org/pdf/2306.04615
Abstract Energy-efficient execution of task-based parallel applications is crucial as tasking is a widely supported feature in many parallel programming libraries and runtimes. Currently, state-of-the-art proposals primarily rely on leveraging core asymmetry and CPU DVFS. Additionally, these proposals mostly use heuristics and lack the ability to explore the trade-offs between energy usage and performance. However, our findings demonstrate that focusing solely on CPU energy consumption for energy-efficient scheduling while neglecting memory energy consumption leaves room for further energy savings. We propose JOSS, a runtime scheduling framework that leverages both CPU DVFS and memory DVFS in conjunction with core asymmetry and task characteristics to enable energy-efficient execution of task-based applications. JOSS also enables the exploration of energy and performance trade-offs by supporting user-defined performance constraints. JOSS uses a set of models to predict task execution time, CPU and memory power consumption, and then selects the configuration for the tunable knobs to achieve the desired energy performance trade-off. Our evaluation shows that JOSS achieves 21.2% energy reduction, on average, compared to the state-of-the-art. Moreover, we demonstrate that even in the absence of a memory DVFS knob, taking energy consumption of both CPU and memory into account achieves better energy savings compared to only accounting for CPU energy. Furthermore, JOSS is able to adapt scheduling to reduce energy consumption while satisfying the desired performance constraints.
Improved Mesh Processing using Distorted Pole Spherical Coordinates
Authors: Grzegorz Borowik, Michał Balicki, Michał Kasprzak, Piotr Cukier
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.04625
Pdf link: https://arxiv.org/pdf/2306.04625
Abstract The Cartesian coordinate system is the most commonly used system in computer visualization. This is due to its ease of use and processing speed. However, it is not always suitable for a given problem. Angular measures often allow us to operate more efficiently on a three-dimensional model. When dealing with issues related to the processing of three-dimensional objects described using polygon meshes, it often happens that these standard systems do not satisfy specific properties that are crucial to us. The topic of the paper is to discuss a specific transformation to spherical coordinates with distorted poles, which allows us to eliminate singular points from the determined subset of the mesh and bypass inconvenient seam lines in its two-dimensional projection, which can hinder further calculations.
Generative Adversarial Shaders for Real-Time Realism Enhancement
Authors: Arturo Salmi, Szabolcs Cséfalvay, James Imber
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04629
Pdf link: https://arxiv.org/pdf/2306.04629
Abstract Application of realism enhancement methods, particularly in real-time and resource-constrained settings, has been frustrated by the expense of existing methods. These achieve high quality results only at the cost of long runtimes and high bandwidth, memory, and power requirements. We present an efficient alternative: a high-performance, generative shader-based approach that adapts machine learning techniques to real-time applications, even in resource-constrained settings such as embedded and mobile GPUs. The proposed learnable shader pipeline comprises differentiable functions that can be trained in an end-to-end manner using an adversarial objective, allowing for faithful reproduction of the appearance of a target image set without manual tuning. The shader pipeline is optimized for highly efficient execution on the target device, providing temporally stable, faster-than-real time results with quality competitive with many neural network-based methods.
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Authors: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04637
Pdf link: https://arxiv.org/pdf/2306.04637
Abstract Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
Keyword: faster

Learning Causal Mechanisms through Orthogonal Neural Networks
Authors: Peyman Sheikholharam Mashhadi, Slawomir Nowaczyk
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.03938
Pdf link: https://arxiv.org/pdf/2306.03938
Abstract A fundamental feature of human intelligence is the ability to infer high-level abstractions from low-level sensory data. An essential component of such inference is the ability to discover modularized generative mechanisms. Despite many efforts to use statistical learning and pattern recognition for finding disentangled factors, arguably human intelligence remains unmatched in this area. In this paper, we investigate a problem of learning, in a fully unsupervised manner, the inverse of a set of independent mechanisms from distorted data points. We postulate, and justify this claim with experimental results, that an important weakness of existing machine learning solutions lies in the insufficiency of cross-module diversification. Addressing this crucial discrepancy between human and machine intelligence is an important challenge for pattern recognition systems. To this end, our work proposes an unsupervised method that discovers and disentangles a set of independent mechanisms from unlabeled data, and learns how to invert them. A number of experts compete against each other for individual data points in an adversarial setting: one that best inverses the (unknown) generative mechanism is the winner. We demonstrate that introducing an orthogonalization layer into the expert architectures enforces additional diversity in the outputs, leading to significantly better separability. Moreover, we propose a procedure for relocating data points between experts to further prevent any one from claiming multiple mechanisms. We experimentally illustrate that these techniques allow discovery and modularization of much less pronounced transformations, in addition to considerably faster convergence.
Accelerating 128-bit Floating-Point Matrix Multiplication on FPGAs
Authors: Fumiya Kono, Naohito Nakasato, Maho Nakata
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2306.04087
Pdf link: https://arxiv.org/pdf/2306.04087
Abstract General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application is semidefinite programming (SDP), and it often requires binary128 or higher precision arithmetic to solve problems involving SDP stably. However, only some processors support binary128 arithmetic, which makes SDP solvers generally slow. In this study, we focused on accelerating GEMM with binary128 arithmetic on field-programmable gate arrays (FPGAs) to enable the flexible design of accelerators for the desired computations. Our binary128 GEMM designs on a recent high-performance FPGA achieved approximately 90GFlops, 147x faster than the computation executed on a recent CPU with 20 threads for large matrices. Using our binary128 GEMM design on the FPGA, we successfully accelerated two numerical applications: LU decomposition and SDP problems, for the first time.
Balancing of competitive two-player Game Levels with Reinforcement Learning
Authors: Florian Rupp, Manuel Eberhardinger, Kai Eckert
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2306.04429
Pdf link: https://arxiv.org/pdf/2306.04429
Abstract The balancing process for game levels in a competitive two-player context involves a lot of manual work and testing, particularly in non-symmetrical game levels. In this paper, we propose an architecture for automated balancing of tile-based levels within the recently introduced PCGRL framework (procedural content generation via reinforcement learning). Our architecture is divided into three parts: (1) a level generator, (2) a balancing agent and, (3) a reward modeling simulation. By playing the level in a simulation repeatedly, the balancing agent is rewarded for modifying it towards the same win rates for all players. To this end, we introduce a novel family of swap-based representations to increase robustness towards playability. We show that this approach is capable to teach an agent how to alter a level for balancing better and faster than plain PCGRL. In addition, by analyzing the agent's swapping behavior, we can draw conclusions about which tile types influence the balancing most. We test and show our results using the Neural MMO (NMMO) environment in a competitive two-player setting.
Dual policy as self-model for planning
Authors: Jaesung Yoo, Fernanda de la Torre, Robert Guangyu Yang
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04440
Pdf link: https://arxiv.org/pdf/2306.04440
Abstract Planning is a data efficient decision-making strategy where an agent selects candidate actions by exploring possible future states. To simulate future states when there is a high-dimensional action space, the knowledge of one's decision making strategy must be used to limit the number of actions to be explored. We refer to the model used to simulate one's decisions as the agent's self-model. While self-models are implicitly used widely in conjunction with world models to plan actions, it remains unclear how self-models should be designed. Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model. In such dual-policy agents, a model-free policy and a distilled policy are used for model-free actions and planned actions, respectively. Our results on a ecologically relevant, parametric environment indicate that distilled policy network for self-model stabilizes training, has faster inference than using model-free policy, promotes better exploration, and could learn a comprehensive understanding of its own behaviors, at the cost of distilling a new network apart from the model-free policy.
Referring Expression Comprehension Using Language Adaptive Inference
Authors: Wei Su, Peihan Miao, Huanzhang Dou, Yongjian Fu, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04451
Pdf link: https://arxiv.org/pdf/2306.04451
Abstract Different from universal object detection, referring expression comprehension (REC) aims to locate specific objects referred to by natural language expressions. The expression provides high-level concepts of relevant visual and contextual patterns, which vary significantly with different expressions and account for only a few of those encoded in the REC model. This leads us to a question: do we really need the entire network with a fixed structure for various referring expressions? Ideally, given an expression, only expression-relevant components of the REC model are required. These components should be small in number as each expression only contains very few visual and contextual clues. This paper explores the adaptation between expressions and REC models for dynamic inference. Concretely, we propose a neat yet efficient framework named Language Adaptive Dynamic Subnets (LADS), which can extract language-adaptive subnets from the REC model conditioned on the referring expressions. By using the compact subnet, the inference can be more economical and efficient. Extensive experiments on RefCOCO, RefCOCO+, RefCOCOg, and Referit show that the proposed method achieves faster inference speed and higher accuracy against state-of-the-art approaches.
Hardening and Speeding Up Zero-interaction Pairing and Authentication
Authors: Mikhail Fomichev, Timm Lippert, Matthias Hollick
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2306.04458
Pdf link: https://arxiv.org/pdf/2306.04458
Abstract Establishing and maintaining secure communications in the Internet of Things (IoT) is vital to protect smart devices. Zero-interaction pairing (ZIP) and zero-interaction authentication (ZIA) enable IoT devices to establish and maintain secure communications without user interaction by utilizing devices' ambient context, e.g., audio. For autonomous operation, ZIP and ZIA require the context to have enough entropy to resist attacks and complete in a timely manner. Despite the low-entropy context being the norm, like inside an unoccupied room, the research community has yet to come up with ZIP and ZIA schemes operating under such conditions. We propose HARDZIPA, a novel approach that turns commodity IoT actuators into injecting devices, generating high-entropy context. Here, we combine the capability of IoT actuators to impact the environment, e.g., emitting a sound, with a pseudorandom number generator (PRNG) featured by many actuators to craft hard-to-predict context stimuli. To demonstrate the feasibility of HARDZIPA, we implement it on off-the-shelf IoT actuators, i.e., smart speakers, lights, and humidifiers. We comprehensively evaluate HARDZIPA, collecting over 80 hours of various context data in real-world scenarios. Our results show that HARDZIPA is able to thwart advanced active attacks on ZIP and ZIA schemes, while doubling the amount of context entropy in many cases, which allows two times faster pairing and authentication.
Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt
Authors: Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04607
Pdf link: https://arxiv.org/pdf/2306.04607
Abstract Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks such as image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.
Generative Adversarial Shaders for Real-Time Realism Enhancement
Authors: Arturo Salmi, Szabolcs Cséfalvay, James Imber
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04629
Pdf link: https://arxiv.org/pdf/2306.04629
Abstract Application of realism enhancement methods, particularly in real-time and resource-constrained settings, has been frustrated by the expense of existing methods. These achieve high quality results only at the cost of long runtimes and high bandwidth, memory, and power requirements. We present an efficient alternative: a high-performance, generative shader-based approach that adapts machine learning techniques to real-time applications, even in resource-constrained settings such as embedded and mobile GPUs. The proposed learnable shader pipeline comprises differentiable functions that can be trained in an end-to-end manner using an adversarial objective, allowing for faithful reproduction of the appearance of a target image set without manual tuning. The shader pipeline is optimized for highly efficient execution on the target device, providing temporally stable, faster-than-real time results with quality competitive with many neural network-based methods.
Keyword: mobile

Enhancing Virtual Assistant Intelligence: Precise Area Targeting for Instance-level User Intents beyond Metadata
Authors: Mengyu Chen, Zhenchang Xing, Jieshan Chen, Chunyang Chen, Qinghua Lu
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04163
Pdf link: https://arxiv.org/pdf/2306.04163
Abstract Virtual assistants have been widely used by mobile phone users in recent years. Although their capabilities of processing user intents have been developed rapidly, virtual assistants in most platforms are only capable of handling pre-defined high-level tasks supported by extra manual efforts of developers. However, instance-level user intents containing more detailed objectives with complex practical situations, are yet rarely studied so far. In this paper, we explore virtual assistants capable of processing instance-level user intents based on pixels of application screens, without the requirements of extra extensions on the application side. We propose a novel cross-modal deep learning pipeline, which understands the input vocal or textual instance-level user intents, predicts the targeting operational area, and detects the absolute button area on screens without any metadata of applications. We conducted a user study with 10 participants to collect a testing dataset with instance-level user intents. The testing dataset is then utilized to evaluate the performance of our model, which demonstrates that our model is promising with the achievement of 64.43% accuracy on our testing dataset.
MobileNMT: Enabling Translation in 15MB and 30ms
Authors: Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04235
Pdf link: https://arxiv.org/pdf/2306.04235
Abstract Deploying NMT models on mobile devices is essential for privacy, low latency, and offline scenarios. For high model capacity, NMT models are rather large. Running these models on devices is challenging with limited storage, memory, computation, and power consumption. Existing work either only focuses on a single metric such as FLOPs or general engine which is not good at auto-regressive decoding. In this paper, we present MobileNMT, a system that can translate in 15MB and 30ms on devices. We propose a series of principles for model compression when combined with quantization. Further, we implement an engine that is friendly to INT8 and decoding. With the co-design of model and engine, compared with the existing system, we speed up 47.0x and save 99.5% of memory with only 11.6% loss of BLEU. The code is publicly available at https://github.com/zjersey/Lightseq-ARM.
A Mask Free Neural Network for Monaural Speech Enhancement
Authors: Liang Liu, Haixin Guan, Jinlong Ma, Wei Dai, Guangyong Wang, Shaowei Ding
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.04286
Pdf link: https://arxiv.org/pdf/2306.04286
Abstract In speech enhancement, the lack of clear structural characteristics in the target speech phase requires the use of conservative and cumbersome network frameworks. It seems difficult to achieve competitive performance using direct methods and simple network architectures. However, we propose the MFNet, a direct and simple network that can not only map speech but also map reverse noise. This network is constructed by stacking global local former blocks (GLFBs), which combine the advantages of Mobileblock for global processing and Metaformer architecture for local interaction. Our experimental results demonstrate that our network using mapping method outperforms masking methods, and direct mapping of reverse noise is the optimal solution in strong noise environments. In a horizontal comparison on the 2020 Deep Noise Suppression (DNS) challenge test set without reverberation, to the best of our knowledge, MFNet is the current state-of-the-art (SOTA) mapping model.
Efficient Recruitment Strategy for Collaborative Mobile Crowd Sensing Based on GCN Trustworthiness Prediction
Authors: Zhongwei Zhan, Yingjie Wang, Peiyong Duan, Akshita Maradapu Vera Venkata Sai, Zhaowei Liu, Chaocan Xiang, Xiangrong Tong, Weilong Wang, Zhipeng Cai
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04366
Pdf link: https://arxiv.org/pdf/2306.04366
Abstract Collaborative Mobile Crowd Sensing (CMCS) enhances data quality and coverage by promoting teamwork in task sensing, with worker recruitment representing a complex multi-objective optimization problem. Existing strategies mainly focus on the characteristics of workers themselves, neglecting the asymmetric trust relationships between them, which affects the rationality of task utility evaluation. To address this, this paper first employs the Mini-Batch K-Means clustering algorithm and deploys edge servers to enable efficient distributed worker recruitment. Historical data and task requirements are utilized to obtain workers' ability types and distances. A trust-directed graph in the worker's social network is input into the Graph Convolutional Network (GCN) framework for training, capturing asymmetric trustworthiness between worker pairs. Privacy leakage is prevented in CMCS scenarios through high trust values between workers. Ultimately, an undirected recruitment graph is constructed using workers' abilities, trust values, and distance weights, transforming the worker recruitment problem into a Maximum Weight Average Subgraph Problem (MWASP). A Tabu Search Recruitment (TSR) algorithm is proposed to rationally recruit a balanced multi-objective optimal task utility worker set for each task. Extensive simulation experiments on four real-world datasets demonstrate the effectiveness of the proposed strategy, outperforming other strategies.
Recent applications of machine learning, remote sensing, and iot approaches in yield prediction: a critical review
Authors: Fatima Zahra Bassine, Terence Epule Epule, Ayoub Kechchour, Abdelghani Chehbouni
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.04566
Pdf link: https://arxiv.org/pdf/2306.04566
Abstract The integration of remote sensing and machine learning in agriculture is transforming the industry by providing insights and predictions through data analysis. This combination leads to improved yield prediction and water management, resulting in increased efficiency, better yields, and more sustainable agricultural practices. Achieving the United Nations' Sustainable Development Goals, especially "zero hunger," requires the investigation of crop yield and precipitation gaps, which can be accomplished through, the usage of artificial intelligence (AI), machine learning (ML), remote sensing (RS), and the internet of things (IoT). By integrating these technologies, a robust agricultural mobile or web application can be developed, providing farmers and decision-makers with valuable information and tools for improving crop management and increasing efficiency. Several studies have investigated these new technologies and their potential for diverse tasks such as crop monitoring, yield prediction, irrigation management, etc. Through a critical review, this paper reviews relevant articles that have used RS, ML, cloud computing, and IoT in crop yield prediction. It reviews the current state-of-the-art in this field by critically evaluating different machine-learning approaches proposed in the literature for crop yield prediction and water management. It provides insights into how these methods can improve decision-making in agricultural production systems. This work will serve as a compendium for those interested in yield prediction in terms of primary literature but, most importantly, what approaches can be used for real-time and robust prediction.
Generative Adversarial Shaders for Real-Time Realism Enhancement
Authors: Arturo Salmi, Szabolcs Cséfalvay, James Imber
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04629
Pdf link: https://arxiv.org/pdf/2306.04629
Abstract Application of realism enhancement methods, particularly in real-time and resource-constrained settings, has been frustrated by the expense of existing methods. These achieve high quality results only at the cost of long runtimes and high bandwidth, memory, and power requirements. We present an efficient alternative: a high-performance, generative shader-based approach that adapts machine learning techniques to real-time applications, even in resource-constrained settings such as embedded and mobile GPUs. The proposed learnable shader pipeline comprises differentiable functions that can be trained in an end-to-end manner using an adversarial objective, allowing for faithful reproduction of the appearance of a target image set without manual tuning. The shader pipeline is optimized for highly efficient execution on the target device, providing temporally stable, faster-than-real time results with quality competitive with many neural network-based methods.
Keyword: pruning

CFDP: Common Frequency Domain Pruning
Authors: Samir Khaki, Weihan Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04147
Pdf link: https://arxiv.org/pdf/2306.04147
Abstract As the saying goes, sometimes less is more -- and when it comes to neural networks, that couldn't be more true. Enter pruning, the art of selectively trimming away unnecessary parts of a network to create a more streamlined, efficient architecture. In this paper, we introduce a novel end-to-end pipeline for model pruning via the frequency domain. This work aims to shed light on the interoperability of intermediate model outputs and their significance beyond the spatial domain. Our method, dubbed Common Frequency Domain Pruning (CFDP) aims to extrapolate common frequency characteristics defined over the feature maps to rank the individual channels of a layer based on their level of importance in learning the representation. By harnessing the power of CFDP, we have achieved state-of-the-art results on CIFAR-10 with GoogLeNet reaching an accuracy of 95.25%, that is, +0.2% from the original model. We also outperform all benchmarks and match the original model's performance on ImageNet, using only 55% of the trainable parameters and 60% of the FLOPs. In addition to notable performances, models produced via CFDP exhibit robustness to a variety of configurations including pruning from untrained neural architectures, and resistance to adversarial attacks. The implementation code can be found at https://github.com/Skhaki18/CFDP.
Keyword: diffusion

Randomized Schur Complement Views for Graph Contrastive Learning
Authors: Vignesh Kothapalli
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2306.04004
Pdf link: https://arxiv.org/pdf/2306.04004
Abstract We introduce a randomized topological augmentor based on Schur complements for Graph Contrastive Learning (GCL). Given a graph laplacian matrix, the technique generates unbiased approximations of its Schur complements and treats the corresponding graphs as augmented views. We discuss the benefits of our approach, provide theoretical justifications and present connections with graph diffusion. Unlike previous efforts, we study the empirical effectiveness of the augmentor in a controlled fashion by varying the design choices for subsequent GCL phases, such as encoding and contrasting. Extensive experiments on node and graph classification benchmarks demonstrate that our technique consistently outperforms pre-defined and adaptive augmentation approaches to achieve state-of-the-art results.
Professional Basketball Player Behavior Synthesis via Planning with Diffusion
Authors: Xiusi Chen, Wei-Yao Wang, Ziniu Hu, Curtis Chou, Lam Hoang, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2306.04090
Pdf link: https://arxiv.org/pdf/2306.04090
Abstract Dynamically planning in multi-agent systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing both concealed strategic policies and decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes makes it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we first formulate the sequential decision-making process as a conditional trajectory generation process. We further introduce PLAYBEST (PLAYer BEhavior SynThesis), a method for enhancing player decision-making. We extend the state-of-the-art generative model, diffusion probabilistic model, to learn challenging multi-agent environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained using the play-by-play data with corresponding rewards acting as the plan guidance. To accomplish reward-guided trajectory generation, conditional sampling is introduced to condition the diffusion model on the value function and conduct classifier-guided sampling. We validate the effectiveness of PLAYBEST via comprehensive simulation studies from real-world data, contrasting the generated trajectories and play strategies with those employed by professional basketball teams. Our results reveal that the model excels at generating high-quality basketball trajectories that yield efficient plays, surpassing conventional planning techniques in terms of adaptability, flexibility, and overall performance. Moreover, the synthesized play strategies exhibit a remarkable alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
Phoenix: A Federated Generative Diffusion Model
Authors: Fiona Victoria Stanley Jothiraj, Afra Mashhadi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04098
Pdf link: https://arxiv.org/pdf/2306.04098
Abstract Generative AI has made impressive strides in enabling users to create diverse and realistic visual content such as images, videos, and audio. However, training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility. Federated learning (FL) is an approach that uses decentralized techniques to collaboratively train a shared deep learning model while retaining the training data on individual edge devices to preserve data privacy. This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using FL techniques. Diffusion models, a newly emerging generative model, show promising results in achieving superior quality images than Generative Adversarial Networks (GANs). Our proposed method Phoenix is an unconditional diffusion model that leverages strategies to improve the data diversity of generated samples even when trained on data with statistical heterogeneity or Non-IID (Non-Independent and Identically Distributed) data. We demonstrate how our approach outperforms the default diffusion model in an FL setting. These results indicate that high-quality samples can be generated by maintaining data diversity, preserving privacy, and reducing communication between data sources, offering exciting new possibilities in the field of generative AI.
MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation
Authors: Tony Tohme, Mohsen Sadr, Kamal Youcef-Toumi, Nicolas G. Hadjiconstantinou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04120
Pdf link: https://arxiv.org/pdf/2306.04120
Abstract We introduce MESSY estimation, a Maximum-Entropy based Stochastic and Symbolic densitY estimation method. The proposed approach recovers probability density functions symbolically from samples using moments of a Gradient flow in which the ansatz serves as the driving force. In particular, we construct a gradient-based drift-diffusion process that connects samples of the unknown distribution function to a guess symbolic expression. We then show that when the guess distribution has the maximum entropy form, the parameters of this distribution can be found efficiently by solving a linear system of equations constructed using the moments of the provided samples. Furthermore, we use Symbolic regression to explore the space of smooth functions and find optimal basis functions for the exponent of the maximum entropy functional leading to good conditioning. The cost of the proposed method in each iteration of the random search is linear with the number of samples and quadratic with the number of basis functions. We validate the proposed MESSY estimation method against other benchmark methods for the case of a bi-modal and a discontinuous density, as well as a density at the limit of physical realizability. We find that the addition of a symbolic search for basis functions improves the accuracy of the estimation at a reasonable additional computational cost. Our results suggest that the proposed method outperforms existing density recovery methods in the limit of a small to moderate number of samples by providing a low-bias and tractable symbolic description of the unknown density at a reasonable computational cost.
A Survey on Generative Diffusion Models for Structured Data
Authors: Heejoon Koo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04139
Pdf link: https://arxiv.org/pdf/2306.04139
Abstract In recent years, generative diffusion models have achieved a rapid paradigm shift in deep generative models by showing groundbreaking performance across various applications. Meanwhile, structured data, encompassing tabular and time series data, has been received comparatively limited attention from the deep learning research community, despite its omnipresence and extensive applications. Thus, there is still a lack of literature and its review on structured data modelling via diffusion models, compared to other data modalities such as computer vision and natural language processing. Hence, in this paper, we present a comprehensive review of recently proposed diffusion models in the field of structured data. First, this survey provides a concise overview of the score-based diffusion model theory, subsequently proceeding to the technical descriptions of the majority of pioneering works using structured data in both data-driven general tasks and domain-specific applications. Thereafter, we analyse and discuss the limitations and challenges shown in existing works and suggest potential research directions. We hope this review serves as a catalyst for the research community, promoting the developments in generative diffusion models for structured data.
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Authors: Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.04301
Pdf link: https://arxiv.org/pdf/2306.04301
Abstract With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-spectrograms. Specifically, a two-stage and a one-stage system are designed respectively, to improve the audio quality and the performance of style transfer. Secondly, a diffusion bridge of quantized VAE is designed to efficiently learn complex discrete style representations and improve the performance of style transfer. To have a better ability of style transfer, we introduce ControlVAE to improve the reconstruction quality and have good interpretability simultaneously. Experiments on LibriTTS dataset demonstrate that our method is more effective than baseline models.
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Authors: Eleonora Grassucci, Sergio Barbarossa, Danilo Comminiello
Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.04321
Pdf link: https://arxiv.org/pdf/2306.04321
Abstract Semantic communication is expected to be one of the cores of next-generation AI-based communications. One of the possibilities offered by semantic communication is the capability to regenerate, at the destination side, images or videos semantically equivalent to the transmitted ones, without necessarily recovering the transmitted sequence of bits. The current solutions still lack the ability to build complex scenes from the received partial information. Clearly, there is an unmet need to balance the effectiveness of generation methods and the complexity of the transmitted information, possibly taking into account the goal of communication. In this paper, we aim to bridge this gap by proposing a novel generative diffusion-guided framework for semantic communication that leverages the strong abilities of diffusion models in synthesizing multimedia content while preserving semantic features. We reduce bandwidth usage by sending highly-compressed semantic information only. Then, the diffusion model learns to synthesize semantic-consistent scenes through spatially-adaptive normalizations from such denoised semantic information. We prove, through an in-depth assessment of multiple scenarios, that our method outperforms existing solutions in generating high-quality images with preserved semantic information even in cases where the received content is significantly degraded. More specifically, our results show that objects, locations, and depths are still recognizable even in the presence of extremely noisy conditions of the communication channel. The code is available at https://github.com/ispamm/GESCO.
Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance
Authors: Gihyun Kwon, Jong Chul Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04396
Pdf link: https://arxiv.org/pdf/2306.04396
Abstract Diffusion models have shown significant progress in image translation tasks recently. However, due to their stochastic nature, there's often a trade-off between style transformation and content preservation. Current strategies aim to disentangle style and content, preserving the source image's structure while successfully transitioning from a source to a target domain under text or one-shot image conditions. Yet, these methods often require computationally intense fine-tuning of diffusion models or additional neural networks. To address these challenges, here we present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance. This results in quicker and more stable image manipulation for both text-guided and image-guided image translation. Our model's adaptability allows it to be implemented with both image- and latent-diffusion models. Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
Synthesizing realistic sand assemblies with denoising diffusion in latent space
Authors: Nikolaos N. Vlassis, WaiChing Sun, Khalid A. Alshibli, Richard A. Regueiro
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04411
Pdf link: https://arxiv.org/pdf/2306.04411
Abstract The shapes and morphological features of grains in sand assemblies have far-reaching implications in many engineering applications, such as geotechnical engineering, computer animations, petroleum engineering, and concentrated solar power. Yet, our understanding of the influence of grain geometries on macroscopic response is often only qualitative, due to the limited availability of high-quality 3D grain geometry data. In this paper, we introduce a denoising diffusion algorithm that uses a set of point clouds collected from the surface of individual sand grains to generate grains in the latent space. By employing a point cloud autoencoder, the three-dimensional point cloud structures of sand grains are first encoded into a lower-dimensional latent space. A generative denoising diffusion probabilistic model is trained to produce synthetic sand that maximizes the log-likelihood of the generated samples belonging to the original data distribution measured by a Kullback-Leibler divergence. Numerical experiments suggest that the proposed method is capable of generating realistic grains with morphology, shapes and sizes consistent with the training data inferred from an F50 sand database. We then use a rigid contact dynamic simulator to pour the synthetic sand in a confined volume to form granular assemblies in a static equilibrium state with targeted distribution properties. To ensure third-party validation, 50,000 synthetic sand grains and the 1,542 real synchrotron microcomputed tomography (SMT) scans of the F50 sand, as well as the granular assemblies composed of synthetic sand grains are made available in an open-source repository.
Multi-modal Latent Diffusion
Authors: Mustapha Bounoua, Giulio Franzese, Pietro Michiardi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04445
Pdf link: https://arxiv.org/pdf/2306.04445
Abstract Multi-modal data-sets are ubiquitous in modern applications, and multi-modal Variational Autoencoders are a popular family of models that aim to learn a joint representation of the different modalities. However, existing approaches suffer from a coherence-quality tradeoff, where models with good generation quality lack generative coherence across modalities, and vice versa. We discuss the limitations underlying the unsatisfactory performance of existing methods, to motivate the need for a different approach. We propose a novel method that uses a set of independently trained, uni-modal, deterministic autoencoders. Individual latent variables are concatenated into a common latent space, which is fed to a masked diffusion model to enable generative modeling. We also introduce a new multi-time training method to learn the conditional score network for multi-modal diffusion. Our methodology substantially outperforms competitors in both generation quality and coherence, as shown through an extensive experimental campaign.
On the Design Fundamentals of Diffusion Models: A Survey
Authors: Ziyi Chang, George A. Koulieris, Hubert P. H. Shum
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04542
Pdf link: https://arxiv.org/pdf/2306.04542
Abstract Diffusion models are generative models, which gradually add and remove noise to learn the underlying distribution of training data for data generation. The components of diffusion models have gained significant attention with many design choices proposed. Existing reviews have primarily focused on higher-level solutions, thereby covering less on the design fundamentals of components. This study seeks to address this gap by providing a comprehensive and coherent review on component-wise design choices in diffusion models. Specifically, we organize this review according to their three key components, namely the forward process, the reverse process, and the sampling procedure. This allows us to provide a fine-grained perspective of diffusion models, benefiting future studies in the analysis of individual components, the applicability of design choices, and the implementation of diffusion models.
Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt
Authors: Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04607
Pdf link: https://arxiv.org/pdf/2306.04607
Abstract Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks such as image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.
ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections
Authors: Chun-Han Yao, Amit Raj, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04619
Pdf link: https://arxiv.org/pdf/2306.04619
Abstract Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging due to the ambiguities of camera viewpoint, pose, texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. Specifically, ARTIC3D is built upon a skeleton-based surface representation and is further guided by 2D diffusion priors from Stable Diffusion. First, we enhance the input images with occlusions/truncation via 2D diffusion to obtain cleaner mask estimates and semantic features. Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images. We also propose a novel technique to calculate more stable image-level gradients via diffusion models compared to existing alternatives. Finally, we produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations. Extensive evaluations on multiple existing datasets as well as newly introduced noisy web image collections with occlusions and truncation demonstrate that ARTIC3D outputs are more robust to noisy images, higher quality in terms of shape and texture details, and more realistic when animated. Project page: https://chhankyao.github.io/artic3d/
Designing a Better Asymmetric VQGAN for StableDiffusion
Authors: Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2306.04632
Pdf link: https://arxiv.org/pdf/2306.04632
Abstract StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at \url{https://github.com/buxiangzhiren/Asymmetric_VQGAN}.
Keyword: adaptive

A Quality Aware Sample-to-Sample Comparison for Face Recognition
Authors: Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Zafari, Moktari Mostofa, Nasser M. Nasrabadi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04000
Pdf link: https://arxiv.org/pdf/2306.04000
Abstract Currently available face datasets mainly consist of a large number of high-quality and a small number of low-quality samples. As a result, a Face Recognition (FR) network fails to learn the distribution of low-quality samples since they are less frequent during training (underrepresented). Moreover, current state-of-the-art FR training paradigms are based on the sample-to-center comparison (i.e., Softmax-based classifier), which results in a lack of uniformity between train and test metrics. This work integrates a quality-aware learning process at the sample level into the classification training paradigm (QAFace). In this regard, Softmax centers are adaptively guided to pay more attention to low-quality samples by using a quality-aware function. Accordingly, QAFace adds a quality-based adjustment to the updating procedure of the Softmax-based classifier to improve the performance on the underrepresented low-quality samples. Our method adaptively finds and assigns more attention to the recognizable low-quality samples in the training datasets. In addition, QAFace ignores the unrecognizable low-quality samples using the feature magnitude as a proxy for quality. As a result, QAFace prevents class centers from getting distracted from the optimal direction. The proposed method is superior to the state-of-the-art algorithms in extensive experimental results on the CFP-FP, LFW, CPLFW, CALFW, AgeDB, IJB-B, and IJB-C datasets.
Randomized Schur Complement Views for Graph Contrastive Learning
Authors: Vignesh Kothapalli
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2306.04004
Pdf link: https://arxiv.org/pdf/2306.04004
Abstract We introduce a randomized topological augmentor based on Schur complements for Graph Contrastive Learning (GCL). Given a graph laplacian matrix, the technique generates unbiased approximations of its Schur complements and treats the corresponding graphs as augmented views. We discuss the benefits of our approach, provide theoretical justifications and present connections with graph diffusion. Unlike previous efforts, we study the empirical effectiveness of the augmentor in a controlled fashion by varying the design choices for subsequent GCL phases, such as encoding and contrasting. Extensive experiments on node and graph classification benchmarks demonstrate that our technique consistently outperforms pre-defined and adaptive augmentation approaches to achieve state-of-the-art results.
Revisiting Neural Retrieval on Accelerators
Authors: Jiaqi Zhai, Zhaojie Gong, Yueming Wang, Xiao Sun, Zheng Yan, Fu Li, Xing Liu
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.04039
Pdf link: https://arxiv.org/pdf/2306.04039
Abstract Retrieval finds a small number of relevant candidates from a large corpus for information retrieval and recommendation applications. A key component of retrieval is to model (user, item) similarity, which is commonly represented as the dot product of two learned embeddings. This formulation permits efficient inference, commonly known as Maximum Inner Product Search (MIPS). Despite its popularity, dot products cannot capture complex user-item interactions, which are multifaceted and likely high rank. We hence examine non-dot-product retrieval settings on accelerators, and propose \textit{mixture of logits} (MoL), which models (user, item) similarity as an adaptive composition of elementary similarity functions. This new formulation is expressive, capable of modeling high rank (user, item) interactions, and further generalizes to the long tail. When combined with a hierarchical retrieval strategy, \textit{h-indexer}, we are able to scale up MoL to 100M corpus on a single GPU with latency comparable to MIPS baselines. On public datasets, our approach leads to uplifts of up to 77.3\% in hit rate (HR). Experiments on a large recommendation surface at Meta showed strong metric gains and reduced popularity bias, validating the proposed approach's performance and improved generalization.
Membership inference attack with relative decision boundary distance
Authors: JiaCheng Xu, ChengXiang Tan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04109
Pdf link: https://arxiv.org/pdf/2306.04109
Abstract Membership inference attack is one of the most popular privacy attacks in machine learning, which aims to predict whether a given sample was contained in the target model's training set. Label-only membership inference attack is a variant that exploits sample robustness and attracts more attention since it assumes a practical scenario in which the adversary only has access to the predicted labels of the input samples. However, since the decision boundary distance, which measures robustness, is strongly affected by the random initial image, the adversary may get opposite results even for the same input samples. In this paper, we propose a new attack method, called muti-class adaptive membership inference attack in the label-only setting. All decision boundary distances for all target classes have been traversed in the early attack iterations, and the subsequent attack iterations continue with the shortest decision boundary distance to obtain a stable and optimal decision boundary distance. Instead of using a single boundary distance, the relative boundary distance between samples and neighboring points has also been employed as a new membership score to distinguish between member samples inside the training set and nonmember samples outside the training set. Experiments show that previous label-only membership inference attacks using the untargeted HopSkipJump algorithm fail to achieve optimal decision bounds in more than half of the samples, whereas our multi-targeted HopSkipJump algorithm succeeds in almost all samples. In addition, extensive experiments show that our multi-class adaptive MIA outperforms current label-only membership inference attacks in the CIFAR10, and CIFAR100 datasets, especially for the true positive rate at low false positive rates metric.
Towards Fast Personalized Semi-Supervised Federated Learning in Edge Networks: Algorithm Design and Theoretical Guarantee
Authors: Shuai Wang, Yanqing Xu, Yanyi Yuan, Tony Q. S. Quek
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.04155
Pdf link: https://arxiv.org/pdf/2306.04155
Abstract Recent years have witnessed a huge demand for artificial intelligence and machine learning applications in wireless edge networks to assist individuals with real-time services. Owing to the practical setting and privacy preservation of federated learning (FL), it is a suitable and appealing distributed learning paradigm to deploy these applications at the network edge. Despite the many successful efforts made to apply FL to wireless edge networks, the adopted algorithms mostly follow the same spirit as FedAvg, thereby heavily suffering from the practical challenges of label deficiency and device heterogeneity. These challenges not only decelerate the model training in FL but also downgrade the application performance. In this paper, we focus on the algorithm design and address these challenges by investigating the personalized semi-supervised FL problem and proposing an effective algorithm, namely FedCPSL. In particular, the techniques of pseudo-labeling, and interpolation-based model personalization are judiciously combined to provide a new problem formulation for personalized semi-supervised FL. The proposed FedCPSL algorithm adopts novel strategies, including adaptive client variance reduction, local momentum, and normalized global aggregation, to combat the challenge of device heterogeneity and boost algorithm convergence. Moreover, the convergence property of FedCPSL is thoroughly analyzed and shows that FedCPSL is resilient to both statistical and system heterogeneity, obtaining a sublinear convergence rate. Experimental results on image classification tasks are also presented to demonstrate that the proposed approach outperforms its counterparts in terms of both convergence speed and application performance.
ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function
Authors: JinYoung Kim, Soonwoo Kwon, Hyojun Go, Yunsung Lee, Seungtaek Choi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04175
Pdf link: https://arxiv.org/pdf/2306.04175
Abstract Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance of the existing CL method. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in k-NN evaluation on CIFAR-10, CIFAR-100, and ImageNet-100. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and improvement when used with other proposed augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.
Dynamic Probabilistic Reliable Broadcast
Authors: Veronika Anikina, João Paulo Bezerra, Petr Kuznetsov, Liron Schiff, Stefan Schmid
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2306.04221
Pdf link: https://arxiv.org/pdf/2306.04221
Abstract Byzantine reliable broadcast is a primitive that allows a set of processes to agree on a message broadcast by a dedicated source process, even when some of them are malicious (Byzantine). It guarantees that no two correct processes deliver different messages, and if a message is delivered by a correct process, every correct process eventually delivers one. The primitive is known not to scale, as it requires $\Omega(n^2)$ message exchanges, where $n$ is the number of system members. The quadratic cost can be explained by the inherent need for every process to relay a message to every other process. In this paper, we explore ways to overcome this limitation, by casting the problem to the probabilistic setting. We propose a solution in which every broadcast message is validated by a small set of witnesses, which allows us to maintain low latency and small communication complexity. In order to tolerate a slow adaptive adversary, we dynamically select witnesses through a novel use of locality-preserving hash functions. Our simulations demonstrate significant scalability gains of our solution with respect to existing protocols.
T-ADAF: Adaptive Data Augmentation Framework for Image Classification Network based on Tensor T-product Operator
Authors: Feiyang Han, Yun Miao, Zhaoyi Sun, Yimin Wei
Subjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2306.04240
Pdf link: https://arxiv.org/pdf/2306.04240
Abstract Image classification is one of the most fundamental tasks in Computer Vision. In practical applications, the datasets are usually not as abundant as those in the laboratory and simulation, which is always called as Data Hungry. How to extract the information of data more completely and effectively is very important. Therefore, an Adaptive Data Augmentation Framework based on the tensor T-product Operator is proposed in this paper, to triple one image data to be trained and gain the result from all these three images together with only less than 0.1% increase in the number of parameters. At the same time, this framework serves the functions of column image embedding and global feature intersection, enabling the model to obtain information in not only spatial but frequency domain, and thus improving the prediction accuracy of the model. Numerical experiments have been designed for several models, and the results demonstrate the effectiveness of this adaptive framework. Numerical experiments show that our data augmentation framework can improve the performance of original neural network model by 2%, which provides competitive results to state-of-the-art methods.
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Authors: Boyuan Sun, Yuqi Yang, Le Zhang, Ming-Ming Cheng, Qibin Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04300
Pdf link: https://arxiv.org/pdf/2306.04300
Abstract In this paper, we present a simple but performant semi-supervised semantic segmentation approach, termed CorrMatch. Our goal is to mine more high-quality regions from the unlabeled images to leverage the unlabeled data more efficiently via consistency regularization. The key contributions of our CorrMatch are two novel and complementary strategies. First, we introduce an adaptive threshold updating strategy with a relaxed initialization to expand the high-quality regions. Furthermore, we propose to propagate high-confidence predictions through measuring the pairwise similarities between pixels. Despite its simplicity, we show that CorrMatch achieves great performance on popular semi-supervised semantic segmentation benchmarks. Taking the DeepLabV3+ framework with ResNet-101 backbone as our segmentation model, we receive a 76%+ mIoU score on the Pascal VOC 2012 segmentation benchmark with only 92 annotated images provided. We also achieve a consistent improvement over previous semi-supervised semantic segmentation models. Code will be made publicly available.
Generative Semantic Communication: Diffusion Models Beyond Bit Recovery
Authors: Eleonora Grassucci, Sergio Barbarossa, Danilo Comminiello
Subjects: Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2306.04321
Pdf link: https://arxiv.org/pdf/2306.04321
Abstract Semantic communication is expected to be one of the cores of next-generation AI-based communications. One of the possibilities offered by semantic communication is the capability to regenerate, at the destination side, images or videos semantically equivalent to the transmitted ones, without necessarily recovering the transmitted sequence of bits. The current solutions still lack the ability to build complex scenes from the received partial information. Clearly, there is an unmet need to balance the effectiveness of generation methods and the complexity of the transmitted information, possibly taking into account the goal of communication. In this paper, we aim to bridge this gap by proposing a novel generative diffusion-guided framework for semantic communication that leverages the strong abilities of diffusion models in synthesizing multimedia content while preserving semantic features. We reduce bandwidth usage by sending highly-compressed semantic information only. Then, the diffusion model learns to synthesize semantic-consistent scenes through spatially-adaptive normalizations from such denoised semantic information. We prove, through an in-depth assessment of multiple scenarios, that our method outperforms existing solutions in generating high-quality images with preserved semantic information even in cases where the received content is significantly degraded. More specifically, our results show that objects, locations, and depths are still recognizable even in the presence of extremely noisy conditions of the communication channel. The code is available at https://github.com/ispamm/GESCO.
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
Authors: Jiaming Liu, Senqiao Yang, Peidong Jia, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04344
Pdf link: https://arxiv.org/pdf/2306.04344
Abstract Since real-world machine systems are running in non-stationary and continually changing environments, Continual Test-Time Adaptation (CTTA) task is proposed to adapt the pre-trained model to continually changing target domains. Recently, existing methods mainly focus on model-based adaptation, which aims to leverage a self-training manner to extract the target domain knowledge. However, pseudo labels can be noisy and the updated model parameters are uncertain under dynamic data distributions, leading to error accumulation and catastrophic forgetting in the continual adaptation process. To tackle these challenges and maintain the model plasticity, we tactfully design a Visual Domain Adapter (ViDA) for CTTA, explicitly handling both domain-specific and domain-agnostic knowledge. Specifically, we first comprehensively explore the different domain representations of the adapters with trainable high and low-rank embedding space. Then we inject ViDAs into the pre-trained model, which leverages high-rank and low-rank prototypes to adapt the current domain distribution and maintain the continual domain-shared knowledge, respectively. To adapt to the various distribution shifts of each sample in target domains, we further propose a Homeostatic Knowledge Allotment (HKA) strategy, which adaptively merges knowledge from each ViDA with different rank prototypes. Extensive experiments conducted on four widely-used benchmarks demonstrate that our proposed method achieves state-of-the-art performance in both classification and segmentation CTTA tasks. In addition, our method can be regarded as a novel transfer paradigm and showcases promising results in zero-shot adaptation of foundation models to continual downstream tasks and distributions.
SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory
Authors: Han Sun, Rui Gong, Konrad Schindler, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04385
Pdf link: https://arxiv.org/pdf/2306.04385
Abstract Domain adaptive object detection aims to leverage the knowledge learned from a labeled source domain to improve the performance on an unlabeled target domain. Prior works typically require the access to the source domain data for adaptation, and the availability of sufficient data on the target domain. However, these assumptions may not hold due to data privacy and rare data collection. In this paper, we propose and investigate a more practical and challenging domain adaptive object detection problem under both source-free and few-shot conditions, named as SF-FSDA. To overcome this problem, we develop an efficient labeled data factory based approach. Without accessing the source domain, the data factory renders i) infinite amount of synthesized target-domain like images, under the guidance of the few-shot image samples and text description from the target domain; ii) corresponding bounding box and category annotations, only demanding minimum human effort, i.e., a few manually labeled examples. On the one hand, the synthesized images mitigate the knowledge insufficiency brought by the few-shot condition. On the other hand, compared to the popular pseudo-label technique, the generated annotations from data factory not only get rid of the reliance on the source pretrained object detection model, but also alleviate the unavoidably pseudo-label noise due to domain shift and source-free condition. The generated dataset is further utilized to adapt the source pretrained object detection model, realizing the robust object detection under SF-FSDA. The experiments on different settings showcase that our proposed approach outperforms other state-of-the-art methods on SF-FSDA problem. Our codes and models will be made publicly available.
Referring Expression Comprehension Using Language Adaptive Inference
Authors: Wei Su, Peihan Miao, Huanzhang Dou, Yongjian Fu, Xi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2306.04451
Pdf link: https://arxiv.org/pdf/2306.04451
Abstract Different from universal object detection, referring expression comprehension (REC) aims to locate specific objects referred to by natural language expressions. The expression provides high-level concepts of relevant visual and contextual patterns, which vary significantly with different expressions and account for only a few of those encoded in the REC model. This leads us to a question: do we really need the entire network with a fixed structure for various referring expressions? Ideally, given an expression, only expression-relevant components of the REC model are required. These components should be small in number as each expression only contains very few visual and contextual clues. This paper explores the adaptation between expressions and REC models for dynamic inference. Concretely, we propose a neat yet efficient framework named Language Adaptive Dynamic Subnets (LADS), which can extract language-adaptive subnets from the REC model conditioned on the referring expressions. By using the compact subnet, the inference can be more economical and efficient. Extensive experiments on RefCOCO, RefCOCO+, RefCOCOg, and Referit show that the proposed method achieves faster inference speed and higher accuracy against state-of-the-art approaches.
Energy-based Assessment and Driving Behavior of ACC Systems and Humans Inside Platoons
Authors: Theocharis Apostolakis, Michail A. Makridis, Anastasios Kouvelas, Konstantinos Ampountolas
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2306.04476
Pdf link: https://arxiv.org/pdf/2306.04476
Abstract Evidence in the literature shows that automated and human driving modes demonstrate different driving characteristics, i.e., headway policy, spacing policy, reaction time, comfortable acceleration, and others. These differences alter observed traffic dynamics and have an impact on energy consumption. This paper assesses the energy footprint of commercially implemented adaptive cruise control (ACC) systems and human drivers in car-following formation via different models using empirical observations on very similar driving cycles and/or routes. Most importantly, it initiates a critical discussion of the findings under the behavioral properties of each mode. Findings show that: ACC systems propagate an increasing energy consumption upstream, while human drivers do not; they succeed in maintaining a constant time-headway policy, operating very reliably; they develop a strong bond with their leader compared to their human counterparts; the two modes (humans and ACCs) are operating in different phase-space areas with room for improvement. Overall, findings show that ACC systems must be optimized to achieve a trade-off between functional requirements and eco-driving instructions.
Sustainable Adaptive Security
Authors: Liliana Pasquale, Kushal Ramkumar, Wanling Cai, John McCarthy, Gavin Doherty, Bashar Nuseibeh
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2306.04481
Pdf link: https://arxiv.org/pdf/2306.04481
Abstract With software systems permeating our lives, we are entitled to expect that such systems are secure by design, and that such security endures throughout the use of these systems and their subsequent evolution. Although adaptive security systems have been proposed to continuously protect assets from harm, they can only mitigate threats arising from changes foreseen at design time. In this paper, we propose the notion of Sustainable Adaptive Security (SAS) which reflects such enduring protection by augmenting adaptive security systems with the capability of mitigating newly discovered threats. To achieve this objective, a SAS system should be designed by combining automation (e.g., to discover and mitigate security threats) and human intervention (e.g., to resolve uncertainties during threat discovery and mitigation). In this paper, we use a smart home example to showcase how we can engineer the activities of the MAPE (Monitor, Analysis, Planning, and Execution) loop of systems satisfying sustainable adaptive security. We suggest that using anomaly detection together with abductive reasoning can help discover new threats and guide the evolution of security requirements and controls. We also exemplify situations when humans can be involved in the execution of the activities of the MAPE loop and discuss the requirements to engineer human interventions.
Embracing Uncertainty: Adaptive Vague Preference Policy Learning for Multi-round Conversational Recommendation
Authors: Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Lingfei Wu, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Xiangnan He
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2306.04487
Pdf link: https://arxiv.org/pdf/2306.04487
Abstract Conversational recommendation systems (CRS) effectively address information asymmetry by dynamically eliciting user preferences through multi-turn interactions. Existing CRS widely assumes that users have clear preferences. Under this assumption, the agent will completely trust the user feedback and treat the accepted or rejected signals as strong indicators to filter items and reduce the candidate space, which may lead to the problem of over-filtering. However, in reality, users' preferences are often vague and volatile, with uncertainty about their desires and changing decisions during interactions. To address this issue, we introduce a novel scenario called Vague Preference Multi-round Conversational Recommendation (VPMCR), which considers users' vague and volatile preferences in CRS.VPMCR employs a soft estimation mechanism to assign a non-zero confidence score for all candidate items to be displayed, naturally avoiding the over-filtering problem. In the VPMCR setting, we introduce an solution called Adaptive Vague Preference Policy Learning (AVPPL), which consists of two main components: Uncertainty-aware Soft Estimation (USE) and Uncertainty-aware Policy Learning (UPL). USE estimates the uncertainty of users' vague feedback and captures their dynamic preferences using a choice-based preferences extraction module and a time-aware decaying strategy. UPL leverages the preference distribution estimated by USE to guide the conversation and adapt to changes in users' preferences to make recommendations or ask for attributes. Our extensive experiments demonstrate the effectiveness of our method in the VPMCR scenario, highlighting its potential for practical applications and improving the overall performance and applicability of CRS in real-world settings, particularly for users with vague or dynamic preferences.
Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal
Authors: Anastasiia Sedova, Lena Zellinger, Benjamin Roth
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.04502
Pdf link: https://arxiv.org/pdf/2306.04502
Abstract An accurate and substantial dataset is necessary to train a reliable and well-performing model. However, even manually labeled datasets contain errors, not to mention automatically labeled ones. The problem of data denoising was addressed in different existing research, most of which focuses on the detection of outliers and their permanent removal - a process that is likely to over- or underfilter the dataset. In this work, we propose AGRA: a new method for Adaptive GRAdient-based outlier removal. Instead of cleaning the dataset prior to model training, the dataset is adjusted during the training process. By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model at this point or is counter-productive and should be left out for the current update. Extensive evaluation on several datasets demonstrates the AGRA effectiveness, while comprehensive results analysis supports our initial hypothesis: permanent hard outlier removal is not always what model benefits the most from.
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Authors: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2306.04637
Pdf link: https://arxiv.org/pdf/2306.04637
Abstract Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
Keyword: quantization

MobileNMT: Enabling Translation in 15MB and 30ms
Authors: Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.04235
Pdf link: https://arxiv.org/pdf/2306.04235
Abstract Deploying NMT models on mobile devices is essential for privacy, low latency, and offline scenarios. For high model capacity, NMT models are rather large. Running these models on devices is challenging with limited storage, memory, computation, and power consumption. Existing work either only focuses on a single metric such as FLOPs or general engine which is not good at auto-regressive decoding. In this paper, we present MobileNMT, a system that can translate in 15MB and 30ms on devices. We propose a series of principles for model compression when combined with quantization. Further, we implement an engine that is friendly to INT8 and decoding. With the co-design of model and engine, compared with the existing system, we speed up 47.0x and save 99.5% of memory with only 11.6% loss of BLEU. The code is publicly available at https://github.com/zjersey/Lightseq-ARM.
Pseudo-Random Quantization Based Two-Stage Detection in One-Bit Massive MIMO Systems
Authors: Gökhan Yılmaz, Ali Özgür Yılmaz
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.04329
Pdf link: https://arxiv.org/pdf/2306.04329
Abstract Utilizing low-resolution analog-to-digital converters (ADCs) in uplink massive multiple-input multiple-output (MIMO) systems is a practical solution to decrease power consumption. The performance gap between the low and high-resolution systems is small at low signal-to-noise ratio (SNR) regimes. However, at high SNR and with high modulation orders, the achievable rate saturates after a finite SNR value due to the stochastic resonance (SR) phenomenon. This paper proposes a novel pseudo-random quantization (PRQ) scheme by modifying the quantization thresholds that can help compensate for the effects of SR and makes communication with high-order modulation schemes such as $1024$-QAM in one-bit quantized uplink massive MIMO systems possible. Moreover, modified linear detectors for non-zero threshold quantization are derived, and a two-stage uplink detector for single-carrier (SC) multi-user systems is proposed. The first stage is an iterative method called Boxed Newton Detector (BND) that utilizes Newton's Method to maximize the log-likelihood with box constraints. The second stage, Nearest Codeword Detector (NCD), exploits the first stage solution and creates a small set of most likely candidates based on sign constraints to increase detection performance. The proposed two-stage method with PRQ outperforms the state-of-the-art detectors from the literature with comparable complexity while supporting high-order modulation schemes.
Quasi-Newton Detection in One-Bit Pseudo-Randomly Quantized Wideband Massive MIMO Systems
Authors: Gökhan Yılmaz, Ali Özgür Yılmaz
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2306.04354
Pdf link: https://arxiv.org/pdf/2306.04354
Abstract Using low-resolution analog-to-digital converters (ADCs) is a valuable solution to decrease power consumption and cost in massive MIMO systems. Previous studies show that the performance gap between low and high-resolution systems gets more prominent as the signal-to-noise ratio (SNR) increases since the detection performance starts to saturate at some point due to the stochastic resonance (SR) phenomenon. In our previous work, we proposed new quantization and detection schemes for one-bit massive MIMO systems operating under frequency-flat fading. This paper offers a new frequency domain equalization (FDE) scheme that can work with the previously proposed pseudo-random quantization (PRQ) scheme to mitigate the effects of SR to support high-order modulation schemes such as $64$-QAM and $256$-QAM. Our equalizer is based on a projected quasi-Newton method for one-bit uplink massive MIMO systems applicable for orthogonal frequency division multiplexing (OFDM) and single carrier (SC) transmission under frequency-selective fading. The proposed low-complexity detector can outperform the benchmark detector from the literature with very similar complexity. We analyze the effects of PRQ under frequency-selective fading for different scenarios and show that the PRQ scheme can close the performance gap between SC and OFDM systems by simulations.

A-suozhang / GetArxivDaily

New submissions for Thu, 8 Jun 23 #76

Keyword: efficient

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Guiding The Last Layer in Federated Learning with Pre-Trained Models

Rao-Blackwellized Particle Smoothing for Simultaneous Localization and Mapping

Leveraging Explicit Procedural Instructions for Data-Efficient Action Prediction

PILLAR: How to make semi-private learning more effective

ECQED: Emotion-Cause Quadruple Extraction in Dialogs

Explainable AI using expressive Boolean formulas

Elephant Flows Detection Using Deep Neural Network, Convolutional Neural Network, Long Short Term Memory and Autoencoder

Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)

A Novel Implementation Methodology for Error Correction Codes on a Neuromorphic Architecture

Revisiting Neural Retrieval on Accelerators

Active Sparse Conversations for Improved Audio-Visual Embodied Navigation

Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

Blockchain Technology in Higher Education Ecosystem: Unraveling the Good, Bad, and Ugly

Professional Basketball Player Behavior Synthesis via Planning with Diffusion

A novel deeponet model for learning moving-solution operators with applications to earthquake hypocenter localization

Quasi-Newton Updating for Large-Scale Distributed Learning

MESSY Estimation: Maximum-Entropy based Stochastic and Symbolic densitY Estimation

Multimodal Fusion Interactions: A Study of Human and Automatic Quantification

Multi-Agent Reinforcement Learning for Cooperative Air Transportation Services in City-Wide Autonomous Urban Air Mobility

CFDP: Common Frequency Domain Pruning

SANGEET: A XML based Open Dataset for Research in Hindustani Sangeet

A Unified One-Step Solution for Aspect Sentiment Quad Prediction

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

Synthesising Recursive Functions for First-Order Model Counting: Challenges, Progress, and Conjectures

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Extracting Cloud-based Model with Prior Knowledge

High-Performance Caching of Homomorphic Encryption for Cloud Databases

Self-Adjusting Weighted Expected Improvement for Bayesian Optimization

On Isolating Roots in a Multiple Field Extension

HornFuzz: Fuzzing CHC solvers

Revising deep learning methods in parking lot occupancy detection

Introduction and Assessment of the Addition of Links and Containers to the Blackboard Architecture

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

Self-Resolving Prediction Markets for Unverifiable Outcomes

Compressed Sensing Based Channel Estimation for Movable Antenna Communications

Estimating nested expectations without inner conditional sampling and application to value of information analysis

Efficient Recruitment Strategy for Collaborative Mobile Crowd Sensing Based on GCN Trustworthiness Prediction

Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning

Get More for Less in Decentralized Learning Systems

SF-FSDA: Source-Free Few-Shot Domain Adaptive Object Detection with Efficient Labeled Data Factory

Symplectic multirate generalized additive Runge-Kutta methods for Hamiltonian systems

The Noir Dataflow Platform: Efficient Data Processing without Complexity

Dual policy as self-model for planning

Fast Optimal Locally Private Mean Estimation via Random Projections

Referring Expression Comprehension Using Language Adaptive Inference

Training-Free Neural Active Learning with Initialization-Robustness Guarantees

Maintaining the cycle structure of dynamic permutations

Fair Column Subset Selection

Active Reconfigurable Intelligent Surfaces for the Millimeter-Wave Frequency Band: System Design and Measurement

Analysing the Robustness of NSGA-II under Noise

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models

Top-Down Knowledge Compilation for Counting Modulo Theories

Towards Decentralized Heterogeneous Multi-Robot SLAM and Target Tracking

MarineVRS: Marine Video Retrieval System with Explainability via Semantic Understanding

Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management -- A Systematic Literature Review

Acoustic singular surfaces in an exponential class of inhomogeneous gases: A new numerical approach based on Krylov subspace spectral methodologies

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency

Improved Mesh Processing using Distorted Pole Spherical Coordinates

Generative Adversarial Shaders for Real-Time Realism Enhancement

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Keyword: faster

Learning Causal Mechanisms through Orthogonal Neural Networks

Accelerating 128-bit Floating-Point Matrix Multiplication on FPGAs

Balancing of competitive two-player Game Levels with Reinforcement Learning

Dual policy as self-model for planning

Referring Expression Comprehension Using Language Adaptive Inference

Hardening and Speeding Up Zero-interaction Pairing and Authentication

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt

Generative Adversarial Shaders for Real-Time Realism Enhancement

Keyword: mobile

Enhancing Virtual Assistant Intelligence: Precise Area Targeting for Instance-level User Intents beyond Metadata

MobileNMT: Enabling Translation in 15MB and 30ms

A Mask Free Neural Network for Monaural Speech Enhancement

Efficient Recruitment Strategy for Collaborative Mobile Crowd Sensing Based on GCN Trustworthiness Prediction