Abstract
Service robots need common-sense knowledge to help humans in everyday situations as it enables them to understand the context of their actions. However, approaches that use ontologies face a challenge because common-sense knowledge is often implicit, i.e., it is obvious to humans but not explicitly stated. This paper investigates if Large Language Models (LLMs) can fill this gap. Our experiments reveal limited effectiveness in the selective extraction of contextual action knowledge, suggesting that LLMs may not be sufficient on their own. However, the large-scale extraction of general, actionable knowledge shows potential, indicating that LLMs can be a suitable tool for efficiently creating ontologies for robots. This paper shows that the technique used for knowledge extraction can be applied to populate a minimalist ontology, showcasing the potential of LLMs in synergy with formal knowledge representation.
Reduction of large-scale RLCk models via low-rank balanced truncation
Abstract
Model order reduction (MOR) is an important step in the design process of integrated circuits. Specifically, the electromagnetic models extracted from modern complex designs result in a large number of passive elements that introduce limitations in the simulation process. MOR techniques based on balanced truncation (BT) can overcome these limitations by producing compact reduced-order models (ROMs) that approximate the behavior of the original models at the input/output ports. In this paper, we present a low-rank BT method that exploits the extended Krylov subspace and efficient implementation techniques for the reduction of large-scale models. Experimental evaluation on a diverse set of analog and mixed-signal circuits with millions of elements indicates that up to x5.5 smaller ROMs can be produced with similar accuracy to ANSYS RaptorX ROMs.
Exploration of Hyperledger Besu in Designing Private Blockchain-based Financial Distribution Systems
Authors: Md. Raisul Hasan Shahrukh, Md. Tabassinur Rahman, Nafees Mansoor
Subjects: Cryptography and Security (cs.CR); Computational Engineering, Finance, and Science (cs.CE); Software Engineering (cs.SE)
Abstract
Blockchain, a decentralized technology that provides unrivaled security, transparency, and process validation, is redefining the operational landscape across numerous industries. This article focuses on the development of an innovative consortium blockchain based financial distribution application. This paper illuminates the transformative role of blockchain technology in a variety of sectors by drawing on a plethora of academic literature and current industry practices. It demonstrates the diverse applications of blockchain, ranging from remittances to lending and investments in finance to data administration in healthcare and supply chain tracking. The paper reveals the design and potential of a consortium blockchain based application for financial distribution. Utilizing the capabilities of Hyperledger Besu, the application is tailored to improve security, scalability, and interoperability, thereby contributing to a more integrated financial ecosystem. The investigation sheds light on the combination of consortium blockchain controlled access and Hyprledger Besu comprehensive functionality, proposing a secure, transparent, and efficient financial transaction environment. The investigation serves as a resource for academics, industry professionals, and policymakers alike, highlighting the vast potential of blockchain technology, enabled by platforms such as Hyperledger Besu, in accelerating the evolution of traditional systems toward a more decentralized, secure, and efficient future.
Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments
Abstract
In an industry dominated by straight men, many developers representing other gender identities and sexual orientations often encounter hateful or discriminatory messages. Such communications pose barriers to participation for women and LGBTQ+ persons. Due to sheer volume, manual inspection of all communications for discriminatory communication is infeasible for a large-scale Free Open-Source Software (FLOSS) community. To address this challenge, this study aims to develop an automated mechanism to identify Sexual orientation and Gender identity Discriminatory (SGID) texts from software developers' communications. On this goal, we trained and evaluated SGID4SE ( Sexual orientation and Gender Identity Discriminatory text identification for (4) Software Engineering texts) as a supervised learning-based SGID detection tool. SGID4SE incorporates six preprocessing steps and ten state-of-the-art algorithms. SGID4SE implements six different strategies to improve the performance of the minority class. We empirically evaluated each strategy and identified an optimum configuration for each algorithm. In our ten-fold cross-validation-based evaluations, a BERT-based model boosts the best performance with 85.9% precision, 80.0% recall, and 82.9% F1-Score for the SGID class. This model achieves 95.7% accuracy and 80.4% Matthews Correlation Coefficient. Our dataset and tool establish a foundation for further research in this direction.
MADG: Margin-based Adversarial Learning for Domain Generalization
Authors: Aveen Dayal, Vimal K. B., Linga Reddy Cenkeramaddi, C. Krishna Mohan, Abhinav Kumar, Vineeth N Balasubramanian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Domain Generalization (DG) techniques have emerged as a popular approach to address the challenges of domain shift in Deep Learning (DL), with the goal of generalizing well to the target domain unseen during the training. In recent years, numerous methods have been proposed to address the DG setting, among which one popular approach is the adversarial learning-based methodology. The main idea behind adversarial DG methods is to learn domain-invariant features by minimizing a discrepancy metric. However, most adversarial DG methods use 0-1 loss based $\mathcal{H}\Delta\mathcal{H}$ divergence metric. In contrast, the margin loss-based discrepancy metric has the following advantages: more informative, tighter, practical, and efficiently optimizable. To mitigate this gap, this work proposes a novel adversarial learning DG algorithm, MADG, motivated by a margin loss-based discrepancy metric. The proposed MADG model learns domain-invariant features across all source domains and uses adversarial training to generalize well to the unseen target domain. We also provide a theoretical analysis of the proposed MADG model based on the unseen target error bound. Specifically, we construct the link between the source and unseen domains in the real-valued hypothesis space and derive the generalization bound using margin loss and Rademacher complexity. We extensively experiment with the MADG model on popular real-world DG datasets, VLCS, PACS, OfficeHome, DomainNet, and TerraIncognita. We evaluate the proposed algorithm on DomainBed's benchmark and observe consistent performance across all the datasets.
An algorithm for Tambara-Yamagami quantum invariants of 3-manifolds, parameterized by the first Betti number
Authors: Colleen Delaney (UC Berkeley), Clément Maria (INRIA & FGV/EMAp), Eric Samperton (Purdue)
Abstract
Quantum topology provides various frameworks for defining and computing invariants of manifolds. One such framework of substantial interest in both mathematics and physics is the Turaev-Viro-Barrett-Westbury state sum construction, which uses the data of a spherical fusion category to define topological invariants of triangulated 3-manifolds via tensor network contractions. In this work we consider a restricted class of state sum invariants of 3-manifolds derived from Tambara-Yamagami categories. These categories are particularly simple, being entirely specified by three pieces of data: a finite abelian group, a bicharacter of that group, and a sign $\pm 1$. Despite being one of the simplest sources of state sum invariants, the computational complexities of Tambara-Yamagami invariants are yet to be fully understood. We make substantial progress on this problem. Our main result is the existence of a general fixed parameter tractable algorithm for all such topological invariants, where the parameter is the first Betti number of the 3-manifold with $\mathbb{Z}/2\mathbb{Z}$ coefficients. We also explain that these invariants are sometimes #P-hard to compute (and we expect that this is almost always the case). Contrary to other domains of computational topology, such as graphs on surfaces, very few hard problems in 3-manifold topology are known to admit FPT algorithms with a topological parameter. However, such algorithms are of particular interest as their complexity depends only polynomially on the combinatorial representation of the input, regardless of size or combinatorial width. Additionally, in the case of Betti numbers, the parameter itself is easily computable in polynomial time.
Low-Frequency Load Identification using CNN-BiLSTM Attention Mechanism
Authors: Amanie Azzam, Saba Sanami, Amir G. Aghdam
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
Non-intrusive Load Monitoring (NILM) is an established technique for effective and cost-efficient electricity consumption management. The method is used to estimate appliance-level power consumption from aggregated power measurements. This paper presents a hybrid learning approach, consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory (BILSTM), featuring an integrated attention mechanism, all within the context of disaggregating low-frequency power data. While prior research has been mainly focused on high-frequency data disaggregation, our study takes a distinct direction by concentrating on low-frequency data. The proposed hybrid CNN-BILSTM model is adept at extracting both temporal (time-related) and spatial (location-related) features, allowing it to precisely identify energy consumption patterns at the appliance level. This accuracy is further enhanced by the attention mechanism, which aids the model in pinpointing crucial parts of the data for more precise event detection and load disaggregation. We conduct simulations using the existing low-frequency REDD dataset to assess our model performance. The results demonstrate that our proposed approach outperforms existing methods in terms of accuracy and computation time.
Parameter-Efficient Multilingual Summarisation: An Empirical Study
Abstract
With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches face growing challenges, especially in memory-intensive tasks. This paper investigates the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), for complex and under-explored multilingual summarisation tasks. We conduct an extensive study across different data availability scenarios, including full-data, low-data, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA lags behind full fine-tuning when trained with full data, however, it excels in low-data scenarios and cross-lingual transfer. Interestingly, as models scale up, the performance gap between LoRA and full fine-tuning diminishes. Additionally, we investigate effective strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning achieves the best performance compared to both full fine-tuning and dynamic composition of language-specific LoRA modules.
A solver for linear scalar ordinary differential equations whose running time is bounded independent of frequency
Abstract
When the eigenvalues of the coefficient matrix for a linear scalar ordinary differential equation are of large magnitude, its solutions exhibit complicated behaviour, such as high-frequency oscillations, rapid growth or rapid decay. The cost of representing such solutions using standard techniques grows with the magnitudes of the eigenvalues. As a consequence, the running times of most solvers for ordinary differential equations also grow with these eigenvalues. However, a large class of scalar ordinary differential equations with slowly-varying coefficients admit slowly-varying phase functions that can be represented at a cost which is bounded independent of the magnitudes of the eigenvalues of the corresponding coefficient matrix. Here, we introduce a numerical algorithm for constructing slowly-varying phase functions which represent the solutions of a linear scalar ordinary differential equation. Our method's running time depends on the complexity of the equation's coefficients, but is bounded independent of the magnitudes of the equation's eigenvalues. Once the phase functions have been constructed, essentially any reasonable initial or boundary value problem for the scalar equation can be easily solved. We present the results of numerical experiments showing that, despite its greater generality, our algorithm is competitive with state-of-the-art methods for solving highly-oscillatory second order differential equations. We also compare our method with Magnus-type exponential integrators and find that our approach is orders of magnitude faster in the high-frequency regime.
DREAMPlaceFPGA-MP: An Open-Source GPU-Accelerated Macro Placer for Modern FPGAs with Cascade Shapes and Region Constraints
Authors: Zhili Xiong, Rachel Selina Rajarathnam, Zhixing Jiang, Hanqing Zhu, David Z. Pan
Abstract
FPGA macro placement plays a pivotal role in routability and timing closer to the modern FPGA physical design flow. In modern FPGAs, macros could be subject to complex cascade shape constraints requiring instances to be placed in consecutive sites. In addition, in real-world FPGA macro placement scenarios, designs could have various region constraints that specify boundaries within which certain design instances and macros should be placed. In this work, we present DREAMPlaceFPGA-MP, an open-source GPU-accelerated FPGA macro-placer that efficiently generates legal placements for macros while honoring cascade shape requirements and region constraints. Treating multiple macros in a cascade shape as a large single instance and restricting instances to their respective regions, DREAMPlaceFPGA-MP obtains roughly legal placements. The macros are legalized in multiple steps to efficiently handle cascade shapes and region constraints. Our experimental results demonstrate that DREAMPlaceFPGA-MP is among the top contestants of the MLCAD 2023 FPGA Macro-Placement Contest.
Asking More Informative Questions for Grounded Retrieval
Authors: Sedrick Keh, Justin T. Chiu, Daniel Fried
Abstract
When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questions. In doing so, we discover that off-the-shelf visual question answering (VQA) models often make presupposition errors, which standard information gain question selection methods fail to account for. To address this issue, we propose a method that can incorporate presupposition handling into both question selection and belief updates. Specifically, we use a two-stage process, where the model first filters out images which are irrelevant to a given question, then updates its beliefs about which image the user intends. Through self-play and human evaluations, we show that our method is successful in asking informative open-ended questions, increasing accuracy over the past state-of-the-art by 14%, while resulting in 48% more efficient games in human evaluations.
Carbon Responder: Coordinating Demand Response for the Datacenter Fleet
Authors: Jiali Xing, Bilge Acun, Aditya Sundarrajan, David Brooks, Manoj Chakkaravarthy, Nikky Avila, Carole-Jean Wu, Benjamin C. Lee
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
Abstract
The increasing integration of renewable energy sources results in fluctuations in carbon intensity throughout the day. To mitigate their carbon footprint, datacenters can implement demand response (DR) by adjusting their load based on grid signals. However, this presents challenges for private datacenters with diverse workloads and services. One of the key challenges is efficiently and fairly allocating power curtailment across different workloads. In response to these challenges, we propose the Carbon Responder framework. The Carbon Responder framework aims to reduce the carbon footprint of heterogeneous workloads in datacenters by modulating their power usage. Unlike previous studies, Carbon Responder considers both online and batch workloads with different service level objectives and develops accurate performance models to achieve performance-aware power allocation. The framework supports three alternative policies: Efficient DR, Fair and Centralized DR, and Fair and Decentralized DR. We evaluate Carbon Responder polices using production workload traces from a private hyperscale datacenter. Our experimental results demonstrate that the efficient Carbon Responder policy reduces the carbon footprint by around 2x as much compared to baseline approaches adapted from existing methods. The fair Carbon Responder policies distribute the performance penalties and carbon reduction responsibility fairly among workloads.
PEMA: Plug-in External Memory Adaptation for Language Models
Abstract
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstream NLP tasks. Nevertheless, the resource requirements of pre-training large language models in terms of memory and training compute pose significant challenges. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning on specific tasks. To overcome the limitations, we introduce Plug-in External Memory Adaptation (PEMA), a Parameter-Efficient Fine-Tuning (PEFT) approach designed for fine-tuning PLMs without the need for all weights. PEMA can be integrated into the context representation of test data during inference to execute downstream tasks. It leverages an external memory to store context representations generated by a PLM, mapped with the desired target word. Our method entails training LoRA-based weight matrices within the final layer of the PLM for enhanced efficiency. The probability is then interpolated with the next-word distribution from the PLM to perform downstream tasks. To improve the generation quality, we propose a novel interpolation strategy named Gradual Unrolling. To demonstrate the effectiveness of our proposed method, we conduct experiments to demonstrate the efficacy of PEMA with a syntactic dataset and assess its performance on machine translation and style transfer tasks using real datasets. PEMA outperforms other PEFT methods in terms of memory and latency efficiency for training and inference. Furthermore, it outperforms other baselines in preserving the meaning of sentences while generating appropriate language and styles.
Derivation of sixth-order exponential Runge--Kutta methods for stiff systems
Abstract
This work constructs the first-ever sixth-order exponential Runge--Kutta (ExpRK) methods for the time integration of stiff parabolic PDEs. First, we leverage the exponential B-series theory to restate the stiff order conditions for ExpRK methods of arbitrary order based on an essential set of trees only. Then, we explicitly provide the 36 order conditions required for sixth-order methods and present convergence results. In addition, we are able to solve the 36 stiff order conditions in both their weak and strong forms, resulting in two families of sixth-order parallel stages ExpRK schemes. Interestingly, while these new schemes require a high number of stages, they can be implemented efficiently similar to the cost of a 6-stage method. Numerical experiments are given to confirm the accuracy and efficiency of the new schemes.
Computing the k-th Eigenvalue of Symmetric $H^2$-Matrices
Abstract
The numerical solution of eigenvalue problems is essential in various application areas of scientific and engineering domains. In many problem classes, the practical interest is only a small subset of eigenvalues so it is unnecessary to compute all of the eigenvalues. Notable examples are the electronic structure problems where the $k$-th smallest eigenvalue is closely related to the electronic properties of materials. In this paper, we consider the $k$-th eigenvalue problems of symmetric dense matrices with low-rank off-diagonal blocks. We present a linear time generalized LDL decomposition of $\mathcal{H}^2$ matrices and combine it with the bisection eigenvalue algorithm to compute the $k$-th eigenvalue with controllable accuracy. In addition, if more than one eigenvalue is required, some of the previous computations can be reused to compute the other eigenvalues in parallel. Numerical experiments show that our method is more efficient than the state-of-the-art dense eigenvalue solver in LAPACK/ScaLAPACK and ELPA. Furthermore, tests on electronic state calculations of carbon nanomaterials demonstrate that our method outperforms the existing HSS-based bisection eigenvalue algorithm on 3D problems.
Toucan: Token-Aware Character Level Language Modeling
Authors: William Fleshman, Benjamin Van Durme
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more efficient, but they still require decoding characters individually. We propose Toucan, an augmentation to character-level models to make them "token-aware". Comparing our method to prior work, we demonstrate significant speed-ups in character generation without a loss in language modeling performance. We then explore differences between our learned dynamic tokenization of character sequences with popular fixed vocabulary solutions such as Byte-Pair Encoding and WordPiece, finding our approach leads to a greater amount of longer sequences tokenized as single items. Our project and code are available at https://nlp.jhu.edu/nuggets/.
A Statistical Verification Method of Random Permutations for Hiding Countermeasure Against Side-Channel Attacks
Abstract
As NIST is putting the final touches on the standardization of PQC (Post Quantum Cryptography) public key algorithms, it is a racing certainty that peskier cryptographic attacks undeterred by those new PQC algorithms will surface. Such a trend in turn will prompt more follow-up studies of attacks and countermeasures. As things stand, from the attackers' perspective, one viable form of attack that can be implemented thereupon is the so-called "side-channel attack". Two best-known countermeasures heralded to be durable against side-channel attacks are: "masking" and "hiding". In that dichotomous picture, of particular note are successful single-trace attacks on some of the NIST's PQC then-candidates, which worked to the detriment of the former: "masking". In this paper, we cast an eye over the latter: "hiding". Hiding proves to be durable against both side-channel attacks and another equally robust type of attacks called "fault injection attacks", and hence is deemed an auspicious countermeasure to be implemented. Mathematically, the hiding method is fundamentally based on random permutations. There has been a cornucopia of studies on generating random permutations. However, those are not tied to implementation of the hiding method. In this paper, we propose a reliable and efficient verification of permutation implementation, through employing Fisher-Yates' shuffling method. We introduce the concept of an n-th order permutation and explain how it can be used to verify that our implementation is more efficient than its previous-gen counterparts for hiding countermeasures.
ColorTrace: Fungible token coloring and attribution
Authors: Ryan Zarick, Bryan Pellegrino, Isaac Zhang, Thomas Kim, Caleb Banister
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
We formally define the fungible token coloring problem of attributing (coloring) fungible tokens to originating entities (minters), and present, to our knowledge, the first practical onchain algorithm to solve it. Tracking attribution of colored tokens losslessly using existing approaches such as the Colored Coins protocol is computationally intractable due to the per-wallet storage requirements growing in proportion to the number of minters. Our first contribution is an elegant solution to the single-chain token coloring problem, where colored tokens are atomically burned and minted to ensure each wallet only contains tokens of a single color. Our second contribution is an extension to this single-chain token coloring algorithm to allow safe and efficient crosschain token transfers. We present ColorTrace, an onchain algorithm to achieve globally consistent, economically feasible, fungible token coloring.
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining with efficient deterministic sampler, we achieved similar Fr\'echet audio distance (FAD) score as top-ranked baseline with only 10 steps and reached state-of-the-art performance with 50 steps on the DCASE2023 foley sound generation benchmark. We also revealed a potential concern regarding diffusion based audio generation models that they tend to generate samples with high perceptual similarity to the data from training data. Project page: https://agentcooper2002.github.io/EDMSound/
Coreset Selection with Prioritized Multiple Objectives
Abstract
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. When coreset selection is applied in realistic scenes, under the premise that the identified coreset has achieved comparable model performance, practitioners regularly desire the identified coreset can have a size as small as possible for lower costs and greater acceleration. Motivated by this desideratum, for the first time, we pose the problem of "coreset selection with prioritized multiple objectives", in which the smallest coreset size under model performance constraints is explored. Moreover, to address this problem, an innovative method is proposed, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes.
Safer-Instruct: Aligning Language Models with Automated Preference Data
Authors: Taiwei Shi, Kai Chen, Jieyu Zhao
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Reinforcement Learning from Human Feedback (RLHF) is a vital strategy for enhancing model safety in language models. However, annotating preference data for RLHF is a resource-intensive and creativity-demanding process, while automatic generation methods face limitations in data diversity and quality. In response, we present Safer-Instruct, a novel pipeline for semi-automatically constructing large-scale preference datasets. Our approach leverages reversed instruction tuning, instruction induction, and expert model evaluation to efficiently generate high-quality preference data without human annotators. We evaluate Safer-Instruct using LLaMA for instruction induction and GPT-4 as an expert model, generating approximately 10K preference samples. Finetuning an Alpaca model on this dataset demonstrates improved harmlessness while maintaining competitive performance on conversation and downstream tasks. Safer-Instruct addresses the challenges in preference data acquisition, advancing the development of safer and more responsible AI systems. Our code and data are available at https://github.com/uscnlp-lime/safer-instruct
Debate Helps Supervise Unreliable Experts
Authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can't tell the difference between the two on their own? In this work, we show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth. We collect a dataset of human-written debates on hard reading comprehension questions where the judge has not read the source passage, only ever seeing expert arguments and short quotes selectively revealed by 'expert' debaters who have access to the passage. In our debates, one expert argues for the correct answer, and the other for an incorrect answer. Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%. Debates are also more efficient, being 68% of the length of consultancies. By comparing human to AI debaters, we find evidence that with more skilled (in this case, human) debaters, the performance of debate goes up but the performance of consultancy goes down. Our error analysis also supports this trend, with 46% of errors in human debate attributable to mistakes by the honest debater (which should go away with increased skill); whereas 52% of errors in human consultancy are due to debaters obfuscating the relevant evidence from the judge (which should become worse with increased skill). Overall, these results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
Abstract
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions.
AdVENTR: Autonomous Robot Navigation in Complex Outdoor Environments
Abstract
We present a novel system, AdVENTR for autonomous robot navigation in unstructured outdoor environments that consist of uneven and vegetated terrains. Our approach is general and can enable both wheeled and legged robots to handle outdoor terrain complexity including unevenness, surface properties like poor traction, granularity, obstacle stiffness, etc. We use data from sensors including RGB cameras, 3D Lidar, IMU, robot odometry, and pose information with efficient learning-based perception and planning algorithms that can execute on edge computing hardware. Our system uses a scene-aware switching method to perceive the environment for navigation at any time instant and dynamically switches between multiple perception algorithms. We test our system in a variety of sloped, rocky, muddy, and densely vegetated terrains and demonstrate its performance on Husky and Spot robots.
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
Abstract
Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity. In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. To accomplish this, we formulate the conversion process as an optimization problem and provide a closed-form solution. We demonstrate how to transform the target equation into a Vandermonde linear system problem, which can be efficiently solved using the Discrete Fourier Transform (DFT). Notably, our method requires no training and maintains numerical stability. It can be also applied to any LongConv-based model. To assess its effectiveness, we conduct extensive experiments on language modeling tasks across various settings. Additionally, we compare our method to other gradient-descent solutions, highlighting the superior numerical stability of our approach. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.
4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters
Authors: Yijie Zhou, Chao Li, Jin Liang, Tianyi Xu, Xin Liu, Jun Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The illumination of improperly exposed photographs has been widely corrected using deep convolutional neural networks or Transformers. Despite with promising performance, these methods usually suffer from large parameter amounts and heavy computational FLOPs on high-resolution photographs. In this paper, we propose extremely light-weight (with only ~8K parameters) Multi-Scale Linear Transformation (MSLT) networks under the multi-layer perception architecture, which can process 4K-resolution sRGB images at 125 Frame-Per-Second (FPS) by a Titan RTX GPU. Specifically, the proposed MSLT networks first decompose an input image into high and low frequency layers by Laplacian pyramid techniques, and then sequentially correct different layers by pixel-adaptive linear transformation, which is implemented by efficient bilateral grid learning or 1x1 convolutions. Experiments on two benchmark datasets demonstrate the efficiency of our MSLTs against the state-of-the-arts on photo exposure correction. Extensive ablation studies validate the effectiveness of our contributions. The code is available at https://github.com/Zhou-Yijie/MSLTNet.
A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs
Abstract
This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton. Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\log\epsilon|^{d})$-term separable approximation on well-separated domains with error $\epsilon>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\log\epsilon|^{d+1})$-term separable approximation was proved for Poisson-type problems.
Context Adaptive Cooperation
Authors: Timothé Albouy, Davide Frey, Mathieu Gestin, Michel Raynal, François Taïani
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Reliable broadcast and consensus are the two pillars that support a lot of non-trivial fault-tolerant distributed middleware and fault-tolerant distributed systems. While they have close definitions, they strongly differ in the underlying assumptions needed to implement each of them. Reliable broadcast can be implemented in asynchronous systems in the presence of crash or Byzantine failures while Consensus cannot. This key difference stems from the fact that consensus involves synchronization between multiple processes that concurrently propose values, while reliable broadcast simply involves delivering a message from a predefined sender. This paper strikes a balance between these two agreement abstractions in the presence of Byzantine failures. It proposes CAC, a novel agreement abstraction that enables multiple processes to broadcast messages simultaneously, while guaranteeing that (despite potential conflicts, asynchrony, and Byzantine behaviors) the non-faulty processes will agree on messages deliveries. We show that this novel abstraction can enable more efficient algorithms for a variety of applications (such as money transfer where several people can share a same account). This is obtained by focusing the need for synchronization only on the processes that actually need to synchronize.
Language Semantic Graph Guided Data-Efficient Learning
Authors: Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Developing generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.
X-GRL: An Empirical Assessment of Explainable GNN-DRL in B5G/6G Networks
Abstract
The rapid development of artificial intelligence (AI) techniques has triggered a revolution in beyond fifth-generation (B5G) and upcoming sixth-generation (6G) mobile networks. Despite these advances, efficient resource allocation in dynamic and complex networks remains a major challenge. This paper presents an experimental implementation of deep reinforcement learning (DRL) enhanced with graph neural networks (GNNs) on a real 5G testbed. The method addresses the explainability of GNNs by evaluating the importance of each edge in determining the model's output. The custom sampling functions feed the data into the proposed GNN-driven Monte Carlo policy gradient (REINFORCE) agent to optimize the gNodeB (gNB) radio resources according to the specific traffic demands. The demo demonstrates real-time visualization of network parameters and superior performance compared to benchmarks.
Comments on "Dynamic Consensus Committee-Based for Secure Data Sharing With Authorized Multi-Receiver Searchable Encryption"
Abstract
Recently, Yang et al. introduced an efficient searchable encryption scheme titled "Dynamic Consensus Committee-Based for Secure Data Sharing With Authorized Multi-Receiver Searchable Encryption (DCC-SE)," published in IEEE Transactions on Information Forensics and Security (DOI: 10.1109/TIFS.2023.3305183). According to the authors, DCC-SE meets various security requirements, especially the keyword trapdoor indistinguishability against chosen keyword attacks (KT-IND-CKA). In this letter, however, we reveal a significant vulnerability of DCC-SE: any users involved in the system can execute attacks against KT-IND-CKA security. This flaw potentially results in the unintended disclosure of sensitive keyword information related to the documents. We present a detailed cryptanalysis on DCC-SE. In addition, to address this vulnerability, we discuss the root cause and identify a flaw in the security proof of DCC-SE. Subsequently, we provide a solution that effectively addresses this concern without significantly increasing computational overhead.
Abstract
This paper presents FreD, a novel parameterization method for dataset distillation, which utilizes the frequency domain to distill a small-sized synthetic dataset from a large-sized original dataset. Unlike conventional approaches that focus on the spatial domain, FreD employs frequency-based transforms to optimize the frequency representations of each data instance. By leveraging the concentration of spatial domain information on specific frequency components, FreD intelligently selects a subset of frequency dimensions for optimization, leading to a significant reduction in the required budget for synthesizing an instance. Through the selection of frequency dimensions based on the explained variance, FreD demonstrates both theoretical and empirical evidence of its ability to operate efficiently within a limited budget, while better preserving the information of the original dataset compared to conventional parameterization methods. Furthermore, based on the orthogonal compatibility of FreD with existing methods, we confirm that FreD consistently improves the performances of existing distillation methods over the evaluation scenarios with different benchmark datasets. We release the code at https://github.com/sdh0818/FreD.
Reinforcement Learning with Model Predictive Control for Highway Ramp Metering
Authors: Filippo Airaldi, Bart De Schutter, Azita Dabiri
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
Abstract
In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of highway ramp metering control that embeds Reinforcement Learning techniques within the Model Predictive Control framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of a safety-critical constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which merges the advantages of the two paradigms in order to overcome the shortcomings of each framework, is proposed to learn to efficiently control an on-ramp and to satisfy its constraints despite uncertainties in the system model and variable demands. Finally, simulations are performed on a benchmark from the literature consisting of a small-scale highway network. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance compared to the initial controller.
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Abstract
Pretraining multilingual language models from scratch requires considerable computational resources and substantial training data. Therefore, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining. However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the language model, thus weakening the efficiency. To address these issues, we propose a novel framework: \textbf{O}ne \textbf{F}or \textbf{A}ll (\textbf{\textsc{Ofa}}), which wisely initializes the embeddings of unseen subwords from target languages and thus can adapt a PLM to multiple languages efficiently and effectively. \textsc{Ofa} takes advantage of external well-aligned multilingual word embeddings and injects the alignment knowledge into the new embeddings. In addition, \textsc{Ofa} applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, which significantly reduces the number of parameters while not sacrificing the performance. Through extensive experiments, we show models initialized by \textsc{Ofa} are efficient and outperform several baselines. \textsc{Ofa} not only accelerates the convergence of continued pretraining, which is friendly to a limited computation budget, but also improves the zero-shot crosslingual transfer on a wide range of downstream tasks. We make our code and models publicly available.
Verification of GossipSub in ACL2s
Authors: Ankit Kumar (Northeastern University), Max von Hippel (Northeastern University), Panagiotis Manolios (Northeastern University), Cristina Nita-Rotaru (Northeastern University)
Subjects: Logic in Computer Science (cs.LO); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Abstract
GossipSub is a popular new peer-to-peer network protocol designed to disseminate messages quickly and efficiently by allowing peers to forward the full content of messages only to a dynamically selected subset of their neighboring peers (mesh neighbors) while gossiping about messages they have seen with the rest. Peers decide which of their neighbors to graft or prune from their mesh locally and periodically using a score for each neighbor. Scores are calculated using a score function that depends on mesh-specific parameters, weights and counters relating to a peer's performance in the network. Since a GossipSub network's performance ultimately depends on the performance of its peers, an important question arises: Is the score calculation mechanism effective in weeding out non-performing or even intentionally misbehaving peers from meshes? We answered this question in the negative in our companion paper by reasoning about GossipSub using our formal, official and executable ACL2s model. Based on our findings, we synthesized and simulated attacks against GossipSub which were confirmed by the developers of GossipSub, FileCoin, and Eth2.0, and publicly disclosed in MITRE CVE-2022-47547. In this paper, we present a detailed description of our model. We discuss design decisions, security properties of GossipSub, reasoning about the security properties in context of our model, attack generation and lessons we learnt when writing it.
Abstract
We initiate the study of spanners in arbitrarily vertex- or edge-colored graphs (with no "legality" restrictions), that are resilient to failures of entire color classes. When a color fails, all vertices/edges of that color crash. An $f$-color fault-tolerant ($f$-CFT) $t$-spanner of an $n$-vertex colored graph $G$ is a subgraph $H$ that preserves distances up to factor $t$, even in the presence of at most $f$ color faults. This notion generalizes the well-studied $f$-vertex/edge fault-tolerant ($f$-V/EFT) spanners. The size of an $f$-V/EFT spanner crucially depends on the number $f$ of vertex/edge faults to be tolerated. In the colored variants, even a single color fault can correspond to an unbounded number of vertex/edge faults. The key conceptual contribution of this work is in showing that the size (number of edges) required by an $f$-CFT spanner is in fact comparable to its uncolored counterpart, with no dependency on the size of color classes. We provide optimal bounds on the size required by $f$-CFT spanners, revealing an interesting phenomenon: while (individual) edge faults are "easier" than vertex faults in terms of spanner size, edge-color faults are "harder" than vertex-color faults. Our upper bounds are based on a generalization of the blocking set technique of [Bodwin and Patel, PODC 2019] for analyzing the (exponential-time) greedy algorithm for FT spanners. We complement them by providing efficient constructions of CFT spanners with similar size guarantees, based on the algorithm of [Dinitz and Robelle, PODC 2020].
Enabling Large Language Models to Learn from Rules
Authors: Wenkai Yang, Yankai Lin, Jie Zhou, Jirong Wen
Abstract
Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current knowledge learning paradigm of LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, the learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can grasp the new tasks or knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which encodes the rule-based knowledge into LLMs. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules and then explicitly encode the knowledge into LLMs' parameters by learning from the above in-context signals produced inside the model. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence
Abstract
Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In this study, we conducted parameter-efficient fine-tuning on the LLaMA2 model. Distinguishing itself from previous fine-tuning-based table-to-text methods, our approach involves injecting reasoning information into the input by emphasizing table-specific row data. Our model consists of two modules: 1) a table reasoner that identifies relevant row evidence, and 2) a table summarizer that generates sentences based on the highlighted table. To facilitate this, we propose a search strategy to construct reasoning labels for training the table reasoner. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results. Additionally, we observed that highlighting input tables significantly enhances the model's performance and provides valuable interpretability.
Energy-Efficient Design of Satellite-Terrestrial Computing in 6G Wireless Networks
Authors: Qi Wang, Xiaoming Chen, Qiao Qi
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this paper, we investigate the issue of satellite-terrestrial computing in the sixth generation (6G) wireless networks, where multiple terrestrial base stations (BSs) and low earth orbit (LEO) satellites collaboratively provide edge computing services to ground user equipments (GUEs) and space user equipments (SUEs) over the world. In particular, we design a complete process of satellite-terrestrial computing in terms of communication and computing according to the characteristics of 6G wireless networks. In order to minimize the weighted total energy consumption while ensuring delay requirements of computing tasks, an energy-efficient satellite-terrestrial computing algorithm is put forward by jointly optimizing offloading selection, beamforming design and resource allocation. Finally, both theoretical analysis and simulation results confirm fast convergence and superior performance of the proposed algorithm for satellite-terrestrial computing in 6G wireless networks.
Combining Shamir & Additive Secret Sharing to Improve Efficiency of SMC Primitives Against Malicious Adversaries
Abstract
Secure multi-party computation provides a wide array of protocols for mutually distrustful parties be able to securely evaluate functions of private inputs. Within recent years, many such protocols have been proposed representing a plethora of strategies to securely and efficiently handle such computation. These protocols have become increasingly efficient, but their performance still is impractical in many settings. We propose new approaches to some of these problems which are either more efficient than previous works within the same security models or offer better security guarantees with comparable efficiency. The goals of this research are to improve efficiency and security of secure multi-party protocols and explore the application of such approaches to novel threat scenarios. Some of the novel optimizations employed are dynamically switching domains of shared secrets, asymmetric computations, and advantageous functional transformations, among others. Specifically, this work presents a novel combination of Shamir and Additive secret sharing to be used in parallel which allows for the transformation of efficient protocols secure against passive adversaries to be secure against active adversaries. From this set of primitives we propose the construction of a comparison protocol which can be implemented under that approach with a complexity which is more efficient than other recent works for common domains of interest. Finally, we present a system which addresses a critical security threat for the protection and obfuscation of information which may be of high consequence.
An explicit substructuring method for overlapping domain decomposition based on stochastic calculus
Authors: Jorge Morón-Vidal, Francisco Bernal, Atsushi Suzuki
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In a recent paper [{\em F. Bernal, J. Mor\'on-Vidal and J.A. Acebr\'on, Comp.$\&$ Math. App. 146:294-308 (2023)}] an hybrid supercomputing algorithm for elliptic equations has been put forward. The idea is that the interfacial nodal solutions solve a linear system, whose coefficients are expectations of functionals of stochastic differential equations confined within patches of about subdomain size. Compared to standard substructuring techniques such as the Schur complement method for the skeleton, the hybrid approach renders an explicit and sparse shrunken matrix -- hence suitable for being substructured again. The ultimate goal is to push strong scalability beyond the state of the art, by leveraging the scope for parallelisation of stochastic calculus. Here, we present a major revamping of that framework, based on the insight of embedding the domain in a cover of overlapping circles (in two dimensions). This allows for efficient Fourier interpolation along the interfaces (now circumferences) and -- crucially -- for the evaluation of most of the interfacial system entries as the solution of small boundary value problems on a circle. This is both extremely efficient (as they can be solved in parallel and by the pseudospectral method) and free of Monte Carlo error. Stochastic numerics are only needed on the relatively few circles intersecting the domain boundary. In sum, the new formulation is significantly faster, simpler and more accurate, while retaining all of the advantageous properties of PDDSparse. Numerical experiments are included for the purpose of illustration.
A novel concept for Titan robotic exploration based on soft morphing aerial robots
Authors: Fernando Ruiz, Begona Arrue, Anibal Ollero
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
This work introduces a novel approach for Titan exploration based on soft morphing aerial robots leveraging the use of flexible adaptive materials. The controlled deformation of the multirotor arms, actuated by a combination of a pneumatic system and a tendon mechanism, provides the explorer robot with the ability to perform full-body perching and land on rocky, irregular, or uneven terrains, thus unlocking new exploration horizons. In addition, after landing, they can be used for efficient sampling as tendon-driven continuum manipulators, with the pneumatic system drawing in the samples. The proposed arms enable the drone to cover long distances in Titan's atmosphere efficiently, by directing rotor thrust without rotating the body, reducing the aerodynamic drag. Given that the exploration concept is envisioned as a rotorcraft planetary lander, the robot's folding features enable over a 30$\%$ reduction in the hypersonic aeroshell's diameter. Building on this folding capability, the arms can morph partially in flight to navigate tight spaces. As for propulsion, the rotor design, justified through CFD simulations, utilizes a ducted fan configuration tailored for Titan's high Reynolds numbers. The rotors are integrated within the robot's deformable materials, facilitating smooth interactions with the environment. The research spotlights exploration simulations in the Gazebo environment, focusing on the Sotra-Patera cryovolcano region, a location with potential to clarify Titan's unique methane cycle and its Earth-like features. This work addresses one of the primary challenges of the concept by testing the behavior of small-scale deformable arms under conditions mimicking those of Titan. Groundbreaking experiments with liquid nitrogen at cryogenic temperatures were conducted on various materials, with Teflon (PTFE) at low infill rates (15-30%) emerging as a promising option.
Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature
Authors: Randy Kuang, Maria Perepechaenko, Mahmoud Sayed, Dafu Lou
Abstract
In their 2022 study, Kuang et al. introduced Multivariable Polynomial Public Key (MPPK) cryptography, leveraging the inversion relationship between multiplication and division for quantum-safe public key systems. They extended MPPK into Homomorphic Polynomial Public Key (HPPK), employing homomorphic encryption for large hidden ring operations. Originally designed for key encapsulation (KEM), HPPK's security relies on homomorphic encryption of public polynomials. This paper expands HPPK KEM to a digital signature scheme, facing challenges due to the distinct nature of verification compared to decryption. To adapt HPPK KEM to digital signatures, the authors introduce an extension of the Barrett reduction algorithm, transforming modular multiplications into divisions in the verification equation over a prime field. The extended algorithm non-linearly embeds the signature into public polynomial coefficients, addressing vulnerabilities in earlier MPPK DS schemes. Security analysis demonstrates exponential complexity for private key recovery and forged signature attacks, considering ring bit length twice that of the prime field size.
Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images
Authors: Zhaocong liu, Fa Zhang, Lin Cheng, Huanxi Deng, Xiaoyan Yang, Zhenyu Zhang, Chichun Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains.
Algorithmic Cheap Talk
Authors: Yakov Babichenko, Inbal Talgam-Cohen, Haifeng Xu, Konstantin Zabarnyi
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
The literature on strategic communication originated with the influential cheap talk model, which precedes the Bayesian persuasion model by three decades. This model describes an interaction between two agents: sender and receiver. The sender knows some state of the world which the receiver does not know, and tries to influence the receiver's action by communicating a cheap talk message to the receiver. This paper initiates the algorithmic study of cheap talk in a finite environment (i.e., a finite number of states and receiver's possible actions). We first prove that approximating the sender-optimal or the welfare-maximizing cheap talk equilibrium up to a certain additive constant or multiplicative factor is NP-hard. Fortunately, we identify three naturally-restricted cases that admit efficient algorithms for finding a sender-optimal equilibrium. These include a state-independent sender's utility structure, a constant number of states or a receiver having only two actions.
The Chromatic Number of Kneser Hypergraphs via Consensus Division
Abstract
We show that the Consensus Division theorem implies lower bounds on the chromatic number of Kneser hypergraphs, offering a novel proof for a result of Alon, Frankl, and Lov\'{a}sz (Trans. Amer. Math. Soc., 1986) and for its generalization by Kriz (Trans. Amer. Math. Soc., 1992). Our approach is applied to study the computational complexity of the total search problem Kneser$^p$, which given a succinct representation of a coloring of a $p$-uniform Kneser hypergraph with fewer colors than its chromatic number, asks to find a monochromatic hyperedge. We prove that for every prime $p$, the Kneser$^p$ problem with an extended access to the input coloring is efficiently reducible to a quite weak approximation of the Consensus Division problem with $p$ shares. In particular, for $p=2$, the problem is efficiently reducible to any non-trivial approximation of the Consensus Halving problem on normalized monotone functions. We further show that for every prime $p$, the Kneser$^p$ problem lies in the complexity class $\mathsf{PPA}$-$p$. As an application, we establish limitations on the complexity of the Kneser$^p$ problem, restricted to colorings with a bounded number of colors.
New Horizons in Parameter Regularization: A Constraint Approach
Authors: Jörg K.H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter
Abstract
This work presents constrained parameter regularization (CPR), an alternative to traditional weight decay. Instead of applying a constant penalty uniformly to all parameters, we enforce an upper bound on a statistical measure (e.g., the L$_2$-norm) of individual parameter groups. This reformulates learning as a constrained optimization problem. To solve this, we utilize an adaptation of the augmented Lagrangian method. Our approach allows for varying regularization strengths across different parameter groups, removing the need for explicit penalty coefficients in the regularization terms. CPR only requires two hyperparameters and introduces no measurable runtime overhead. We offer empirical evidence of CPR's effectiveness through experiments in the "grokking" phenomenon, image classification, and language modeling. Our findings show that CPR can counteract the effects of grokking, and it consistently matches or surpasses the performance of traditional weight decay.
Guided Scale Space Radon Transform for linear structures detection
Abstract
Using integral transforms to the end of lines detection in images with complex background, makes the detection a hard task needing additional processing to manage the detection. As an integral transform, the Scale Space Radon Transform (SSRT) suffers from such drawbacks, even with its great abilities for thick lines detection. In this work, we propose a method to address this issue for automatic detection of thick linear structures in gray scale and binary images using the SSRT, whatever the image background content. This method involves the calculated Hessian orientations of the investigated image while computing its SSRT, in such a way that linear structures are emphasized in the SSRT space. As a consequence, the subsequent maxima detection in the SSRT space is done on a modified transform space freed from unwanted parts and, consequently, from irrelevant peaks that usually drown the peaks representing lines. Besides, highlighting the linear structure in the SSRT space permitting, thus, to efficiently detect lines of different thickness in synthetic and real images, the experiments show also the method robustness against noise and complex background.
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Authors: Heng-Jui Chang, James Glass
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
This paper introduces Robust Spin (R-Spin), a data-efficient self-supervised fine-tuning framework for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources compared to previous state-of-the-art methods while outperforming them in severely distorted speech scenarios. This paper provides detailed analyses to show how discrete units contribute to speech encoder training and improving robustness in diverse acoustic environments.
Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets
Authors: Christian Mahoney, Peter Gronvall, Nathaniel Huber-Fliflet, Jianping Zhang
Abstract
US corporations regularly spend millions of dollars reviewing electronically-stored documents in legal matters. Recently, attorneys apply text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. While text classification is regularly used to reduce the discovery costs of legal matters, it also faces a perception challenge: amongst lawyers, this technology is sometimes looked upon as a "black box". Put simply, no extra information is provided for attorneys to understand why documents are classified as responsive. In recent years, explainable machine learning has emerged as an active research area. In an explainable machine learning system, predictions or decisions made by a machine learning model are human understandable. In legal 'document review' scenarios, a document is responsive, because one or more of its small text snippets are deemed responsive. In these scenarios, if these responsive snippets can be located, then attorneys could easily evaluate the model's document classification decisions - this is especially important in the field of responsible AI. Our prior research identified that predictive models created using annotated training text snippets improved the precision of a model when compared to a model created using all of a set of documents' text as training. While interesting, manually annotating training text snippets is not generally practical during a legal document review. However, small increases in precision can drastically decrease the cost of large document reviews. Automating the identification of training text snippets without human review could then make the application of training text snippet-based models a practical approach.
SiRA: Sparse Mixture of Low Rank Adaptation
Authors: Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng
Abstract
Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA and other mixture of expert approaches across different single tasks and multitask settings.
Domain Aligned CLIP for Few-shot Classification
Authors: Muhammad Waleed Gondal, Jochen Gast, Inigo Alonso Ruiz, Richard Droste, Tommaso Macri, Suren Kumar, Luitpold Staudigl
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Large vision-language representation learning models like CLIP have demonstrated impressive performance for zero-shot transfer to downstream tasks while largely benefiting from inter-modal (image-text) alignment via contrastive objectives. This downstream performance can further be enhanced by full-scale fine-tuning which is often compute intensive, requires large labelled data, and can reduce out-of-distribution (OOD) robustness. Furthermore, sole reliance on inter-modal alignment might overlook the rich information embedded within each individual modality. In this work, we introduce a sample-efficient domain adaptation strategy for CLIP, termed Domain Aligned CLIP (DAC), which improves both intra-modal (image-image) and inter-modal alignment on target distributions without fine-tuning the main model. For intra-modal alignment, we introduce a lightweight adapter that is specifically trained with an intra-modal contrastive objective. To improve inter-modal alignment, we introduce a simple framework to modulate the precomputed class text embeddings. The proposed few-shot fine-tuning framework is computationally efficient, robust to distribution shifts, and does not alter CLIP's parameters. We study the effectiveness of DAC by benchmarking on 11 widely used image classification tasks with consistent improvements in 16-shot classification upon strong baselines by about 2.3% and demonstrate competitive performance on 4 OOD robustness benchmarks.
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge
Abstract
A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.
A Unified Approach to Learning Ising Models: Beyond Independence and Bounded Width
Authors: Jason Gaitonde, Elchanan Mossel
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Abstract
We revisit the problem of efficiently learning the underlying parameters of Ising models from data. Current algorithmic approaches achieve essentially optimal sample complexity when given i.i.d. samples from the stationary measure and the underlying model satisfies "width" bounds on the total $\ell_1$ interaction involving each node. We show that a simple existing approach based on node-wise logistic regression provably succeeds at recovering the underlying model in several new settings where these assumptions are violated: (1) Given dynamically generated data from a wide variety of local Markov chains, like block or round-robin dynamics, logistic regression recovers the parameters with optimal sample complexity up to $\log\log n$ factors. This generalizes the specialized algorithm of Bresler, Gamarnik, and Shah [IEEE Trans. Inf. Theory'18] for structure recovery in bounded degree graphs from Glauber dynamics. (2) For the Sherrington-Kirkpatrick model of spin glasses, given $\mathsf{poly}(n)$ independent samples, logistic regression recovers the parameters in most of the known high-temperature regime via a simple reduction to weaker structural properties of the measure. This improves on recent work of Anari, Jain, Koehler, Pham, and Vuong [ArXiv'23] which gives distribution learning at higher temperature. (3) As a simple byproduct of our techniques, logistic regression achieves an exponential improvement in learning from samples in the M-regime of data considered by Dutt, Lokhov, Vuffray, and Misra [ICML'21] as well as novel guarantees for learning from the adversarial Glauber dynamics of Chin, Moitra, Mossel, and Sandon [ArXiv'23]. Our approach thus significantly generalizes the elegant analysis of Wu, Sanghavi, and Dimakis [Neurips'19] without any algorithmic modification.
Assessing Translation capabilities of Large Language Models involving English and Indian Languages
Abstract
Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. In this work, our aim is to explore the multilingual capabilities of large language models by using machine translation as a task involving English and 22 Indian languages. We first investigate the translation capabilities of raw large language models, followed by exploring the in-context learning capabilities of the same raw models. We fine-tune these large language models using parameter efficient fine-tuning methods such as LoRA and additionally with full fine-tuning. Through our study, we have identified the best performing large language model for the translation task involving LLMs, which is based on LLaMA. Our results demonstrate significant progress, with average BLEU scores of 13.42, 15.93, 12.13, 12.30, and 12.07, as well as CHRF scores of 43.98, 46.99, 42.55, 42.42, and 45.39, respectively, using 2-stage fine-tuned LLaMA-13b for English to Indian languages on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Similarly, for Indian languages to English, we achieved average BLEU scores of 14.03, 16.65, 16.17, 15.35 and 12.55 along with chrF scores of 36.71, 40.44, 40.26, 39.51, and 36.20, respectively, using fine-tuned LLaMA-13b on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Overall, our findings highlight the potential and strength of large language models for machine translation capabilities, including for languages that are currently underrepresented in LLMs.
Keyword: faster
A solver for linear scalar ordinary differential equations whose running time is bounded independent of frequency
Abstract
When the eigenvalues of the coefficient matrix for a linear scalar ordinary differential equation are of large magnitude, its solutions exhibit complicated behaviour, such as high-frequency oscillations, rapid growth or rapid decay. The cost of representing such solutions using standard techniques grows with the magnitudes of the eigenvalues. As a consequence, the running times of most solvers for ordinary differential equations also grow with these eigenvalues. However, a large class of scalar ordinary differential equations with slowly-varying coefficients admit slowly-varying phase functions that can be represented at a cost which is bounded independent of the magnitudes of the eigenvalues of the corresponding coefficient matrix. Here, we introduce a numerical algorithm for constructing slowly-varying phase functions which represent the solutions of a linear scalar ordinary differential equation. Our method's running time depends on the complexity of the equation's coefficients, but is bounded independent of the magnitudes of the equation's eigenvalues. Once the phase functions have been constructed, essentially any reasonable initial or boundary value problem for the scalar equation can be easily solved. We present the results of numerical experiments showing that, despite its greater generality, our algorithm is competitive with state-of-the-art methods for solving highly-oscillatory second order differential equations. We also compare our method with Magnus-type exponential integrators and find that our approach is orders of magnitude faster in the high-frequency regime.
An explicit substructuring method for overlapping domain decomposition based on stochastic calculus
Authors: Jorge Morón-Vidal, Francisco Bernal, Atsushi Suzuki
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Abstract
In a recent paper [{\em F. Bernal, J. Mor\'on-Vidal and J.A. Acebr\'on, Comp.$\&$ Math. App. 146:294-308 (2023)}] an hybrid supercomputing algorithm for elliptic equations has been put forward. The idea is that the interfacial nodal solutions solve a linear system, whose coefficients are expectations of functionals of stochastic differential equations confined within patches of about subdomain size. Compared to standard substructuring techniques such as the Schur complement method for the skeleton, the hybrid approach renders an explicit and sparse shrunken matrix -- hence suitable for being substructured again. The ultimate goal is to push strong scalability beyond the state of the art, by leveraging the scope for parallelisation of stochastic calculus. Here, we present a major revamping of that framework, based on the insight of embedding the domain in a cover of overlapping circles (in two dimensions). This allows for efficient Fourier interpolation along the interfaces (now circumferences) and -- crucially -- for the evaluation of most of the interfacial system entries as the solution of small boundary value problems on a circle. This is both extremely efficient (as they can be solved in parallel and by the pseudospectral method) and free of Monte Carlo error. Stochastic numerics are only needed on the relatively few circles intersecting the domain boundary. In sum, the new formulation is significantly faster, simpler and more accurate, while retaining all of the advantageous properties of PDDSparse. Numerical experiments are included for the purpose of illustration.
Keyword: mobile
adF: A Novel System for Measuring Web Fingerprinting through Ads
Authors: Miguel A. Bermejo-Agueda (1), Patricia Callejo (1 and 2), Rubén Cuevas (1 and 2), Ángel Cuevas (1 and 2) ((1) Universidad Carlos III de Madrid, (2) uc3m-Santander Big Data Institute)
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Abstract
This paper introduces adF, a novel system for analyzing the vulnerability of different devices, Operating Systems (OSes), and browsers to web fingerprinting. adF performs its measurements from code inserted in ads. We have used our system in several ad campaigns that delivered 5,40 million ad impressions. The collected data enable us to assess the vulnerability of current desktop and mobile devices to web fingerprinting. Based on our results, we estimate that 64% of desktop devices and 40% of mobile devices can be uniquely fingerprinted with our web fingerprinting system. However, the resilience to web fingerprinting varies significantly across browsers and device types, with Chrome on desktops being the most vulnerable configuration.
X-GRL: An Empirical Assessment of Explainable GNN-DRL in B5G/6G Networks
Abstract
The rapid development of artificial intelligence (AI) techniques has triggered a revolution in beyond fifth-generation (B5G) and upcoming sixth-generation (6G) mobile networks. Despite these advances, efficient resource allocation in dynamic and complex networks remains a major challenge. This paper presents an experimental implementation of deep reinforcement learning (DRL) enhanced with graph neural networks (GNNs) on a real 5G testbed. The method addresses the explainability of GNNs by evaluating the importance of each edge in determining the model's output. The custom sampling functions feed the data into the proposed GNN-driven Monte Carlo policy gradient (REINFORCE) agent to optimize the gNodeB (gNB) radio resources according to the specific traffic demands. The demo demonstrates real-time visualization of network parameters and superior performance compared to benchmarks.
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Authors: Yifan Zhou, Dongxing Xu, Haoran Wei, Yanhua Long
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features. To address these challenges, we propose a new AE-based framework termed AEGM. Specifically, we first insert an auxiliary classifier into AE to enhance ASD in a multi-task learning manner. Then, we design a group-based decoder structure, accompanied by an adaptive loss function, to endow the model with domain-specific knowledge. Results on the DCASE 2021 Task 2 development set show that our methods achieve a relative improvement of 13.11% and 15.20% respectively in average AUC over the official AE and MobileNetV2 across test sets of seven machines.
Motion Control of Two Mobile Robots under Allowable Collisions
Authors: Li Tan, Wei Ren, Xi-Ming Sun, Junlin Xiong
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
This letter investigates the motion control problem of two mobile robots under allowable collisions. Here, the allowable collisions mean that the collisions do not damage the mobile robots. The occurrence of the collisions is discussed and the effects of the collisions on the mobile robots are analyzed to develop a hybrid model of each mobile robot under allowable collisions. Based on the effects of the collisions, we show the necessity of redesigning the motion control strategy for mobile robots. Furthermore, impulsive control techniques are applied to redesign the motion control strategy to guarantee the task accomplishment for each mobile robot. Finally, an example is used to illustrate the redesigned motion control strategy.
PDU-set Scheduling Algorithm for XR Traffic in Multi-Service 5G-Advanced Networks
Authors: Pouria Paymard, Stefano Paris, Abolfazl Amiri, Troels E. Kolding, Fernando Sanchez Moya, Klaus I. Pedersen
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
In this paper, we investigate a dynamic packet scheduling algorithm designed to enhance the eXtended Reality (XR) capacity of fifth-generation (5G)-Advanced networks with multiple cells, multiple users, and multiple services. The scheduler exploits the newly defined protocol data unit (PDU)-set information for XR traffic flows to enhance its quality-of-service awareness. To evaluate the performance of the proposed solution, advanced dynamic system-level simulations are conducted. The findings reveal that the proposed scheduler offers a notable improvement in increasing XR capacity up to 45%, while keeping the same enhanced mobile broadband (eMBB) cell throughput as compared to the well-known baseline schedulers.
Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices
Authors: Tiffany Tseng, Matt J. Davidson, Luis Morales-Navarro, Jennifer King Chen, Victoria Delaney, Mark Leibowitz, Jazbo Beason, R. Benjamin Shapiro
Abstract
Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets. Yet, few novice-oriented ML modeling tools are designed to foster hands-on learning of dataset design practices, including how to design for data diversity and inspect for data quality. To this end, we outline a set of four data design practices (DDPs) for designing inclusive ML models and share how we designed a tablet-based application called Co-ML to foster learning of DDPs through a collaborative ML model building experience. With Co-ML, beginners can build image classifiers through a distributed experience where data is synchronized across multiple devices, enabling multiple users to iteratively refine ML datasets in discussion and coordination with their peers. We deployed Co-ML in a 2-week-long educational AIML Summer Camp, where youth ages 13-18 worked in groups to build custom ML-powered mobile applications. Our analysis reveals how multi-user model building with Co-ML, in the context of student-driven projects created during the summer camp, supported development of DDPs involving incorporating data diversity, evaluating model performance, and inspecting for data quality. Additionally, we found that students' attempts to improve model performance often prioritized learnability over class balance. Through this work, we highlight how the combination of collaboration, model testing interfaces, and student-driven projects can empower learners to actively engage in exploring the role of data in ML systems.
Keyword: pruning
CoRE-CoG: Conversational Recommendation of Entities using Constrained Generation
Abstract
End-to-end conversational recommendation systems (CRS) generate responses by leveraging both dialog history and a knowledge base (KB). A CRS mainly faces three key challenges: (1) at each turn, it must decide if recommending a KB entity is appropriate; if so, it must identify the most relevant KB entity to recommend; and finally, it must recommend the entity in a fluent utterance that is consistent with the conversation history. Recent CRSs do not pay sufficient attention to these desiderata, often generating unfluent responses or not recommending (relevant) entities at the right turn. We introduce a new CRS we call CoRE-CoG. CoRE-CoG addresses the limitations in prior systems by implementing (1) a recommendation trigger that decides if the system utterance should include an entity, (2) a type pruning module that improves the relevance of recommended entities, and (3) a novel constrained response generator to make recommendations while maintaining fluency. Together, these modules ensure simultaneous accurate recommendation decisions and fluent system utterances. Experiments with recent benchmarks show the superiority particularly on conditional generation sub-tasks with close to 10 F1 and 4 Recall@1 percent points gain over baselines.
SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer
Authors: Yue Liu, Shanlin Xiao, Bo Li, Zhiyi Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
As the third-generation neural network, the Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency, making it suitable for implementation on edge devices. More recently, the most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance. However, it adopts larger channel dimensions in MLP layers, leading to an increased number of redundant model parameters. To effectively decrease the computational complexity and weight parameters of the model, we explore the Lottery Ticket Hypothesis (LTH) and discover a very sparse ($\ge$90%) subnetwork that achieves comparable performance to the original network. Furthermore, we also design a lightweight token selector module, which can remove unimportant background information from images based on the average spike firing rate of neurons, selecting only essential foreground image tokens to participate in attention calculation. Based on that, we present SparseSpikformer, a co-design framework aimed at achieving sparsity in Spikformer through token and weight pruning techniques. Experimental results demonstrate that our framework can significantly reduce 90% model parameters and cut down Giga Floating-Point Operations (GFLOPs) by 20% while maintaining the accuracy of the original model.
Data Augmentations in Deep Weight Spaces
Authors: Aviv Shamsian, David W. Zhang, Aviv Navon, Yan Zhang, Miltiadis Kofinas, Idan Achituve, Riccardo Valperga, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, Ethan Fetaya, Gal Chechik, Haggai Maron
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes %that were proposed recently and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies.
Do Localization Methods Actually Localize Memorized Data in LLMs?
Authors: Ting-Yun Chang, Jesse Thomason, Robin Jia
Abstract
Large language models (LLMs) can memorize many pretrained sequences verbatim. This paper studies if we can locate a small set of neurons in LLMs responsible for memorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pretrained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model. We evaluate five localization methods on our two benchmarks, and both show similar rankings. All methods exhibit promising localization ability, especially for pruning-based methods, though the neurons they identify are not necessarily specific to a single memorized sequence.
Keyword: diffusion
Finding AI-Generated Faces in the Wild
Authors: Gonzalo J. Aniano Porcile, Jack Gindi, Shivansh Mundra, James R. Verbus, Hany Farid
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
AI-based image generation has continued to rapidly improve, producing increasingly more realistic images with fewer obvious visual flaws. AI-generated images are being used to create fake online profiles which in turn are being used for spam, fraud, and disinformation campaigns. As the general problem of detecting any type of manipulated or synthesized content is receiving increasing attention, here we focus on a more narrow task of distinguishing a real face from an AI-generated face. This is particularly applicable when tackling inauthentic online accounts with a fake user profile photo. We show that by focusing on only faces, a more resilient and general-purpose artifact can be detected that allows for the detection of AI-generated faces from a variety of GAN- and diffusion-based synthesis engines, and across image resolutions (as low as 128 x 128 pixels) and qualities.
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining with efficient deterministic sampler, we achieved similar Fr\'echet audio distance (FAD) score as top-ranked baseline with only 10 steps and reached state-of-the-art performance with 50 steps on the DCASE2023 foley sound generation benchmark. We also revealed a potential concern regarding diffusion based audio generation models that they tend to generate samples with high perceptual similarity to the data from training data. Project page: https://agentcooper2002.github.io/EDMSound/
Towards Graph-Aware Diffusion Modeling for Collaborative Filtering
Authors: Yunqin Zhu, Chao Wang, Hui Xiong
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Recovering masked feedback with neural models is a popular paradigm in recommender systems. Seeing the success of diffusion models in solving ill-posed inverse problems, we introduce a conditional diffusion framework for collaborative filtering that iteratively reconstructs a user's hidden preferences guided by its historical interactions. To better align with the intrinsic characteristics of implicit feedback data, we implement forward diffusion by applying synthetic smoothing filters to interaction signals on an item-item graph. The resulting reverse diffusion can be interpreted as a personalized process that gradually refines preference scores. Through graph Fourier transform, we equivalently characterize this model as an anisotropic Gaussian diffusion in the graph spectral domain, establishing both forward and reverse formulations. Our model outperforms state-of-the-art methods by a large margin on one dataset and yields competitive results on the others.
A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs
Abstract
This work presents an abstract framework for the design, implementation, and analysis of the multiscale spectral generalized finite element method (MS-GFEM), a particular numerical multiscale method originally proposed in [I. Babuska and R. Lipton, Multiscale Model.\;\,Simul., 9 (2011), pp.~373--406]. MS-GFEM is a partition of unity method employing optimal local approximation spaces constructed from local spectral problems. We establish a general local approximation theory demonstrating exponential convergence with respect to local degrees of freedom under certain assumptions, with explicit dependence on key problem parameters. Our framework applies to a broad class of multiscale PDEs with $L^{\infty}$-coefficients in both continuous and discrete, finite element settings, including highly indefinite problems (convection-dominated diffusion, as well as the high-frequency Helmholtz, Maxwell and elastic wave equations with impedance boundary conditions), and higher-order problems. Notably, we prove a local convergence rate of $O(e^{-cn^{1/d}})$ for MS-GFEM for all these problems, improving upon the $O(e^{-cn^{1/(d+1)}})$ rate shown by Babuska and Lipton. Moreover, based on the abstract local approximation theory for MS-GFEM, we establish a unified framework for showing low-rank approximations to multiscale PDEs. This framework applies to the aforementioned problems, proving that the associated Green's functions admit an $O(|\log\epsilon|^{d})$-term separable approximation on well-separated domains with error $\epsilon>0$. Our analysis improves and generalizes the result in [M. Bebendorf and W. Hackbusch, Numerische Mathematik, 95 (2003), pp.~1-28] where an $O(|\log\epsilon|^{d+1})$-term separable approximation was proved for Poisson-type problems.
One-Shot Federated Learning with Classifier-Guided Diffusion Models
Authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost. However, most of the existing methods require auxiliary datasets or training generators, which hinders their practicality in real-world scenarios. In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO, utilizing guidance from client classifiers to generate data that complies with clients' distributions and subsequently training the aggregated model on the server. Specifically, our method involves targeted optimizations in two aspects. On one hand, we conditionally edit the randomly sampled initial noises, embedding them with specified semantics and distributions, resulting in a significant improvement in both the quality and stability of generation. On the other hand, we employ the BN statistics from the classifiers to provide detailed guidance during generation. These tailored optimizations enable us to limitlessly generate datasets, which closely resemble the distribution and quality of the original client dataset. Our method effectively handles the heterogeneous client models and the problems of non-IID features or labels. In terms of privacy protection, our method avoids training any generator or transferring any auxiliary information on clients, eliminating any additional privacy leakage risks. Leveraging the extensive knowledge stored in the pre-trained diffusion model, the synthetic datasets can assist us in surpassing the knowledge limitations of the client samples, resulting in aggregation models that even outperform the performance ceiling of centralized training in some cases, which is convincingly demonstrated in the sufficient quantification and visualization experiments conducted on three large-scale multi-domain image datasets.
A Spectral Diffusion Prior for Hyperspectral Image Super-Resolution
Authors: Jianjun Liu, Zebin Wu, Liang Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Fusion-based hyperspectral image (HSI) super-resolution aims to produce a high-spatial-resolution HSI by fusing a low-spatial-resolution HSI and a high-spatial-resolution multispectral image. Such a HSI super-resolution process can be modeled as an inverse problem, where the prior knowledge is essential for obtaining the desired solution. Motivated by the success of diffusion models, we propose a novel spectral diffusion prior for fusion-based HSI super-resolution. Specifically, we first investigate the spectrum generation problem and design a spectral diffusion model to model the spectral data distribution. Then, in the framework of maximum a posteriori, we keep the transition information between every two neighboring states during the reverse generative process, and thereby embed the knowledge of trained spectral diffusion model into the fusion problem in the form of a regularization term. At last, we treat each generation step of the final optimization problem as its subproblem, and employ the Adam to solve these subproblems in a reverse sequence. Experimental results conducted on both synthetic and real datasets demonstrate the effectiveness of the proposed approach. The code of the proposed approach will be available on https://github.com/liuofficial/SDP.
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Abstract
Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates new text-image pair samples in the proximity space of original ones. Then it combines approximate text generation and feature-level mixup during training to further strengthen the data diversity. The PDG module can largely guarantee the reasonability of the generated samples that are directly used for training without any human inspection for noise rejection. It improves the performance of our model significantly, providing a feasible solution to the data insufficiency problem faced by such fine-grained visual-linguistic tasks. Extensive experiments on two popular datasets of the TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%, 4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES. The codes will be available at https://github.com/HCPLab-SYSU/PersonSearch-CTLG
Fast Detection of Phase Transitions with Multi-Task Learning-by-Confusion
Authors: Julian Arnold, Frank Schäfer, Niels Lörch
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech)
Abstract
Machine learning has been successfully used to study phase transitions. One of the most popular approaches to identifying critical points from data without prior knowledge of the underlying phases is the learning-by-confusion scheme. As input, it requires system samples drawn from a grid of the parameter whose change is associated with potential phase transitions. Up to now, the scheme required training a distinct binary classifier for each possible splitting of the grid into two sides, resulting in a computational cost that scales linearly with the number of grid points. In this work, we propose and showcase an alternative implementation that only requires the training of a single multi-class classifier. Ideally, such multi-task learning eliminates the scaling with respect to the number of grid points. In applications to the Ising model and an image dataset generated with Stable Diffusion, we find significant speedups that closely correspond to the ideal case, with only minor deviations.
Finding polarised communities and tracking information diffusion on Twitter: The Irish Abortion Referendum
Authors: Caroline Pena, Pádraig MacCarron, David J.P. O'Sullivan
Abstract
The analysis of social networks enables the understanding of social interactions, polarisation of ideas, and the spread of information and therefore plays an important role in society. We use Twitter data - as it is a popular venue for the expression of opinion and dissemination of information - to identify opposing sides of a debate and, importantly, to observe how information spreads between these groups in our current polarised climate. To achieve this, we collected over 688,000 Tweets from the Irish Abortion Referendum of 2018 to build a conversation network from users mentions with sentiment-based homophily. From this network, community detection methods allow us to isolate yes- or no-aligned supporters with high accuracy (90.9%). We supplement this by tracking how information cascades spread via over 31,000 retweet-cascades. We found that very little information spread between polarised communities. This provides a valuable methodology for extracting and studying information diffusion on large networks by isolating ideologically polarised groups and exploring the propagation of information within and between these groups.
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Authors: Badour AlBahar, Shunsuke Saito, Hung-Yu Tseng, Changil Kim, Johannes Kopf, Jia-Bin Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D consistent human digitization, these approaches do not generalize well to diverse clothing appearances, and the results lack photorealism. Unlike existing work, we utilize high-capacity 2D diffusion models pretrained for general image synthesis tasks as an appearance prior of clothed humans. To achieve better 3D consistency while retaining the input identity, we progressively synthesize multiple views of the human in the input image by inpainting missing regions with shape-guided diffusion conditioned on silhouette and surface normal. We then fuse these synthesized multi-view images via inverse rendering to obtain a fully textured high-resolution 3D mesh of the given person. Experiments show that our approach outperforms prior methods and achieves photorealistic 360-degree synthesis of a wide range of clothed humans with complex textures from a single image.
Keyword: adaptive
Efficient Continual Pre-training for Building Domain Specific Large Language Models
Abstract
Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, LLMs tailored for a domain are trained from scratch to excel at handling domain-specific tasks. In this work, we explore an alternative strategy of continual pre-training as a means to develop domain-specific LLMs. We introduce FinPythia-6.9B, developed through domain-adaptive continual pre-training on the financial domain. Continual pre-trained FinPythia showcases consistent improvements on financial tasks over the original foundational model. We further explore simple but effective data selection strategies for continual pre-training. Our data selection strategies outperforms vanilla continual pre-training's performance with just 10% of corpus size and cost, without any degradation on open-domain standard tasks. Our work proposes an alternative solution to building domain-specific LLMs from scratch in a cost-effective manner.
Adversarial Imitation Learning On Aggregated Data
Authors: Pierre Le Pelletier de Woillemont, Rémi Labory, Vincent Corruble
Abstract
Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function. However, current methods are constrained by at least one of the following requirements. The first one is the need to fully solve a forward Reinforcement Learning (RL) problem in the inner loop of the algorithm, which might be prohibitively expensive in many complex environments. The second one is the need for full trajectories from the experts, which might not be easily available. The third one is the assumption that the expert data is homogeneous rather than a collection from various experts or possibly alternative solutions to the same task. Such constraints make IRL approaches either not scalable or not usable on certain existing systems. In this work we propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD). It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework. The reward learner only uses aggregated data. Moreover, it generates diverse behaviors producing a distribution over the aggregated data matching that of the experts.
MOSAIC: A Multi-Objective Optimization Framework for Sustainable Datacenter Management
Abstract
In recent years, cloud service providers have been building and hosting datacenters across multiple geographical locations to provide robust services. However, the geographical distribution of datacenters introduces growing pressure to both local and global environments, particularly when it comes to water usage and carbon emissions. Unfortunately, efforts to reduce the environmental impact of such datacenters often lead to an increase in the cost of datacenter operations. To co-optimize the energy cost, carbon emissions, and water footprint of datacenter operation from a global perspective, we propose a novel framework for multi-objective sustainable datacenter management (MOSAIC) that integrates adaptive local search with a collaborative decomposition-based evolutionary algorithm to intelligently manage geographical workload distribution and datacenter operations. Our framework sustainably allocates workloads to datacenters while taking into account multiple geography- and time-based factors including renewable energy sources, variable energy costs, power usage efficiency, carbon factors, and water intensity in energy. Our experimental results show that, compared to the best-known prior work frameworks, MOSAIC can achieve 27.45x speedup and 1.53x improvement in Pareto Hypervolume while reducing the carbon footprint by up to 1.33x, water footprint by up to 3.09x, and energy costs by up to 1.40x. In the simultaneous three-objective co-optimization scenario, MOSAIC achieves a cumulative improvement across all objectives (carbon, water, cost) of up to 4.61x compared to the state-of-the-arts.
4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters
Authors: Yijie Zhou, Chao Li, Jin Liang, Tianyi Xu, Xin Liu, Jun Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The illumination of improperly exposed photographs has been widely corrected using deep convolutional neural networks or Transformers. Despite with promising performance, these methods usually suffer from large parameter amounts and heavy computational FLOPs on high-resolution photographs. In this paper, we propose extremely light-weight (with only ~8K parameters) Multi-Scale Linear Transformation (MSLT) networks under the multi-layer perception architecture, which can process 4K-resolution sRGB images at 125 Frame-Per-Second (FPS) by a Titan RTX GPU. Specifically, the proposed MSLT networks first decompose an input image into high and low frequency layers by Laplacian pyramid techniques, and then sequentially correct different layers by pixel-adaptive linear transformation, which is implemented by efficient bilateral grid learning or 1x1 convolutions. Experiments on two benchmark datasets demonstrate the efficiency of our MSLTs against the state-of-the-arts on photo exposure correction. Extensive ablation studies validate the effectiveness of our contributions. The code is available at https://github.com/Zhou-Yijie/MSLTNet.
HFORD: High-Fidelity and Occlusion-Robust De-identification for Face Privacy Protection
Abstract
With the popularity of smart devices and the development of computer vision technology, concerns about face privacy protection are growing. The face de-identification technique is a practical way to solve the identity protection problem. The existing facial de-identification methods have revealed several problems, including the impact on the realism of anonymized results when faced with occlusions and the inability to maintain identity-irrelevant details in anonymized results. We present a High-Fidelity and Occlusion-Robust De-identification (HFORD) method to deal with these issues. This approach can disentangle identities and attributes while preserving image-specific details such as background, facial features (e.g., wrinkles), and lighting, even in occluded scenes. To disentangle the latent codes in the GAN inversion space, we introduce an Identity Disentanglement Module (IDM). This module selects the latent codes that are closely related to the identity. It further separates the latent codes into identity-related codes and attribute-related codes, enabling the network to preserve attributes while only modifying the identity. To ensure the preservation of image details and enhance the network's robustness to occlusions, we propose an Attribute Retention Module (ARM). This module adaptively preserves identity-irrelevant details and facial occlusions and blends them into the generated results in a modulated manner. Extensive experiments show that our method has higher quality, better detail fidelity, and stronger occlusion robustness than other face de-identification methods.
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Authors: Yifan Zhou, Dongxing Xu, Haoran Wei, Yanhua Long
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features. To address these challenges, we propose a new AE-based framework termed AEGM. Specifically, we first insert an auxiliary classifier into AE to enhance ASD in a multi-task learning manner. Then, we design a group-based decoder structure, accompanied by an adaptive loss function, to endow the model with domain-specific knowledge. Results on the DCASE 2021 Task 2 development set show that our methods achieve a relative improvement of 13.11% and 15.20% respectively in average AUC over the official AE and MobileNetV2 across test sets of seven machines.
Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding
Abstract
Recent endeavors in video temporal grounding enforce strong cross-modal interactions through attention mechanisms to overcome the modality gap between video and text query. However, previous works treat all video clips equally regardless of their semantic relevance with the text query in attention modules. In this paper, our goal is to provide clues for query-associated video clips within the crossmodal encoding process. With our Correlation-Guided Detection Transformer~(CG-DETR), we explore the appropriate clip-wise degree of cross-modal interactions and how to exploit such degrees for prediction. First, we design an adaptive cross-attention layer with dummy tokens. Dummy tokens conditioned by text query take a portion of the attention weights, preventing irrelevant video clips from being represented by the text query. Yet, not all word tokens equally inherit the text query's correlation to video clips. Thus, we further guide the cross-attention map by inferring the fine-grained correlation between video clips and words. We enable this by learning a joint embedding space for high-level concepts, i.e., moment and sentence level, and inferring the clip-word correlation. Lastly, we use a moment-adaptive saliency detector to exploit each video clip's degrees of text engagement. We validate the superiority of CG-DETR with the state-of-the-art results on various benchmarks for both moment retrieval and highlight detection. Codes are available at https://github.com/wjun0830/CGDETR.
Adaptive space-time model order reduction with dual-weighted residual (MORe DWR) error control for poroelasticity
Authors: Hendrik Fischer, Julian Roth, Ludovic Chamoin, Amelie Fau, Mary F. Wheeler, Thomas Wick
Abstract
In this work, the space-time MORe DWR (Model Order Reduction with Dual-Weighted Residual error estimates) framework is extended and further developed for single-phase flow problems in porous media. Specifically, our problem statement is the Biot system which consists of vector-valued displacements (geomechanics) coupled to a Darcy flow pressure equation. The MORe DWR method introduces a goal-oriented adaptive incremental proper orthogonal decomposition (POD) based-reduced-order model (ROM). The error in the reduced goal functional is estimated during the simulation, and the POD basis is enriched on-the-fly if the estimate exceeds a given threshold. This results in a reduction of the total number of full-order-model solves for the simulation of the porous medium, a robust estimation of the quantity of interest and well-suited reduced bases for the problem at hand. We apply a space-time Galerkin discretization with Taylor-Hood elements in space and a discontinuous Galerkin method with piecewise constant functions in time. The latter is well-known to be similar to the backward Euler scheme. We demonstrate the efficiency of our method on the well-known two-dimensional Mandel benchmark and a three-dimensional footing problem.
Progressive Feedback-Enhanced Transformer for Image Forgery Localization
Authors: Haochen Zhu, Gang Cao, Xianglin Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers for enhancing the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.
Reasoning over Description Logic-based Contexts with Transformers
Abstract
One way that the current state of the art measures the reasoning ability of transformer-based models is by evaluating accuracy in downstream tasks like logical question answering or proof generation over synthetic contexts expressed in natural language. However, most of the contexts used are in practice very simple; in most cases, they are generated from short first-order logic sentences with only a few logical operators and quantifiers. In this work, we seek to answer the question how well a transformer-based model will perform reasoning over expressive contexts. For this purpose, we construct a synthetic natural language question-answering dataset, generated by description logic knowledge bases. For the generation of the knowledge bases, we use the expressive language $\mathcal{ALCQ}$. The resulting dataset contains 384K examples, and increases in two dimensions: i) reasoning depth, and ii) length of sentences. We show that the performance of our DeBERTa-based model, DELTA$_M$, is marginally affected when the reasoning depth is increased and it is not affected at all when the length of the sentences is increasing. We also evaluate the generalization ability of the model on reasoning depths unseen at training, both increasing and decreasing, revealing interesting insights into the model's adaptive generalization abilities.
A novel concept for Titan robotic exploration based on soft morphing aerial robots
Authors: Fernando Ruiz, Begona Arrue, Anibal Ollero
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
This work introduces a novel approach for Titan exploration based on soft morphing aerial robots leveraging the use of flexible adaptive materials. The controlled deformation of the multirotor arms, actuated by a combination of a pneumatic system and a tendon mechanism, provides the explorer robot with the ability to perform full-body perching and land on rocky, irregular, or uneven terrains, thus unlocking new exploration horizons. In addition, after landing, they can be used for efficient sampling as tendon-driven continuum manipulators, with the pneumatic system drawing in the samples. The proposed arms enable the drone to cover long distances in Titan's atmosphere efficiently, by directing rotor thrust without rotating the body, reducing the aerodynamic drag. Given that the exploration concept is envisioned as a rotorcraft planetary lander, the robot's folding features enable over a 30$\%$ reduction in the hypersonic aeroshell's diameter. Building on this folding capability, the arms can morph partially in flight to navigate tight spaces. As for propulsion, the rotor design, justified through CFD simulations, utilizes a ducted fan configuration tailored for Titan's high Reynolds numbers. The rotors are integrated within the robot's deformable materials, facilitating smooth interactions with the environment. The research spotlights exploration simulations in the Gazebo environment, focusing on the Sotra-Patera cryovolcano region, a location with potential to clarify Titan's unique methane cycle and its Earth-like features. This work addresses one of the primary challenges of the concept by testing the behavior of small-scale deformable arms under conditions mimicking those of Titan. Groundbreaking experiments with liquid nitrogen at cryogenic temperatures were conducted on various materials, with Teflon (PTFE) at low infill rates (15-30%) emerging as a promising option.
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge
Abstract
A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.
Keyword: quantization
Data Augmentations in Deep Weight Spaces
Authors: Aviv Shamsian, David W. Zhang, Aviv Navon, Yan Zhang, Miltiadis Kofinas, Idan Achituve, Riccardo Valperga, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, Ethan Fetaya, Gal Chechik, Haggai Maron
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes %that were proposed recently and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies.
Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation
Abstract
Recently, large language models (LLMs) have shown great potential in recommender systems, either improving existing recommendation models or serving as the backbone. However, there exists a large semantic gap between LLMs and recommender systems, since items to be recommended are often indexed by discrete identifiers (item ID) out of the LLM's vocabulary. In essence, LLMs capture language semantics while recommender systems imply collaborative semantics, making it difficult to sufficiently leverage the model capacity of LLMs for recommendation. To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems. Our approach can directly generate items from the entire item set for recommendation, without relying on candidate items. Specifically, we make two major contributions in our approach. For item indexing, we design a learning-based vector quantization method with uniform semantic mapping, which can assign meaningful and non-conflicting IDs (called item indices) for items. For alignment tuning, we propose a series of specially designed tuning tasks to enhance the integration of collaborative semantics in LLMs. Our fine-tuning tasks enforce LLMs to deeply integrate language and collaborative semantics (characterized by the learned item indices), so as to achieve an effective adaptation to recommender systems. Extensive experiments demonstrate the effectiveness of our method, showing that our approach can outperform a number of competitive baselines including traditional recommenders and existing LLM-based recommenders. Our code is available at https://github.com/RUCAIBox/LC-Rec/.
Keyword: efficient
Exploring Large Language Models as a Source of Common-Sense Knowledge for Robots
Reduction of large-scale RLCk models via low-rank balanced truncation
Exploration of Hyperledger Besu in Designing Private Blockchain-based Financial Distribution Systems
Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments
MADG: Margin-based Adversarial Learning for Domain Generalization
An algorithm for Tambara-Yamagami quantum invariants of 3-manifolds, parameterized by the first Betti number
Low-Frequency Load Identification using CNN-BiLSTM Attention Mechanism
Parameter-Efficient Multilingual Summarisation: An Empirical Study
A solver for linear scalar ordinary differential equations whose running time is bounded independent of frequency
DREAMPlaceFPGA-MP: An Open-Source GPU-Accelerated Macro Placer for Modern FPGAs with Cascade Shapes and Region Constraints
Asking More Informative Questions for Grounded Retrieval
Carbon Responder: Coordinating Demand Response for the Datacenter Fleet
PEMA: Plug-in External Memory Adaptation for Language Models
Derivation of sixth-order exponential Runge--Kutta methods for stiff systems
Computing the k-th Eigenvalue of Symmetric $H^2$-Matrices
Toucan: Token-Aware Character Level Language Modeling
A Statistical Verification Method of Random Permutations for Hiding Countermeasure Against Side-Channel Attacks
ColorTrace: Fungible token coloring and attribution
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Coreset Selection with Prioritized Multiple Objectives
Safer-Instruct: Aligning Language Models with Automated Preference Data
Debate Helps Supervise Unreliable Experts
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
AdVENTR: Autonomous Robot Navigation in Complex Outdoor Environments
Accelerating Toeplitz Neural Network with Constant-time Inference Complexity
4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters
A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs
Context Adaptive Cooperation
Language Semantic Graph Guided Data-Efficient Learning
X-GRL: An Empirical Assessment of Explainable GNN-DRL in B5G/6G Networks
Comments on "Dynamic Consensus Committee-Based for Secure Data Sharing With Authorized Multi-Receiver Searchable Encryption"
Frequency Domain-based Dataset Distillation
Reinforcement Learning with Model Predictive Control for Highway Ramp Metering
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Verification of GossipSub in ACL2s
Color Fault-Tolerant Spanners
Enabling Large Language Models to Learn from Rules
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence
Energy-Efficient Design of Satellite-Terrestrial Computing in 6G Wireless Networks
Combining Shamir & Additive Secret Sharing to Improve Efficiency of SMC Primitives Against Malicious Adversaries
An explicit substructuring method for overlapping domain decomposition based on stochastic calculus
A novel concept for Titan robotic exploration based on soft morphing aerial robots
Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature
Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images
Algorithmic Cheap Talk
The Chromatic Number of Kneser Hypergraphs via Consensus Division
New Horizons in Parameter Regularization: A Constraint Approach
Guided Scale Space Radon Transform for linear structures detection
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Explainable Text Classification Techniques in Legal Document Review: Locating Rationales without Using Human Annotated Training Text Snippets
SiRA: Sparse Mixture of Low Rank Adaptation
Domain Aligned CLIP for Few-shot Classification
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge
A Unified Approach to Learning Ising Models: Beyond Independence and Bounded Width
Assessing Translation capabilities of Large Language Models involving English and Indian Languages
Keyword: faster
A solver for linear scalar ordinary differential equations whose running time is bounded independent of frequency
An explicit substructuring method for overlapping domain decomposition based on stochastic calculus
Keyword: mobile
adF: A Novel System for Measuring Web Fingerprinting through Ads
X-GRL: An Empirical Assessment of Explainable GNN-DRL in B5G/6G Networks
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Motion Control of Two Mobile Robots under Allowable Collisions
PDU-set Scheduling Algorithm for XR Traffic in Multi-Service 5G-Advanced Networks
Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices
Keyword: pruning
CoRE-CoG: Conversational Recommendation of Entities using Constrained Generation
SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer
Data Augmentations in Deep Weight Spaces
Do Localization Methods Actually Localize Memorized Data in LLMs?
Keyword: diffusion
Finding AI-Generated Faces in the Wild
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Towards Graph-Aware Diffusion Modeling for Collaborative Filtering
A unified framework for multiscale spectral generalized FEMs and low-rank approximations to multiscale PDEs
One-Shot Federated Learning with Classifier-Guided Diffusion Models
A Spectral Diffusion Prior for Hyperspectral Image Super-Resolution
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search
Fast Detection of Phase Transitions with Multi-Task Learning-by-Confusion
Finding polarised communities and tracking information diffusion on Twitter: The Irish Abortion Referendum
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Keyword: adaptive
Efficient Continual Pre-training for Building Domain Specific Large Language Models
Adversarial Imitation Learning On Aggregated Data
MOSAIC: A Multi-Objective Optimization Framework for Sustainable Datacenter Management
4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters
HFORD: High-Fidelity and Occlusion-Robust De-identification for Face Privacy Protection
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding
Adaptive space-time model order reduction with dual-weighted residual (MORe DWR) error control for poroelasticity
Progressive Feedback-Enhanced Transformer for Image Forgery Localization
Reasoning over Description Logic-based Contexts with Transformers
A novel concept for Titan robotic exploration based on soft morphing aerial robots
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge
Keyword: quantization
Data Augmentations in Deep Weight Spaces
Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation