New submissions for Tue, 7 Nov 23

Keyword: efficient

Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model

Authors: Miguel AP Oliveira, Stephane Manara, Bruno Molé, Thomas Muller, Aurélien Guillouche, Lysann Hesske, Bruce Jordan, Gilles Hubert, Chinmay Kulkarni, Pralipta Jagdev, Cedric R. Berger
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.02082
Pdf link: https://arxiv.org/pdf/2311.02082
Abstract Individuals and organizations cope with an always-growing data amount, heterogeneous in contents and formats. Prerequisites to get value out this data and minimise inherent risks related to multiple usages are adequate data management processes yielding data quality and control over its lifecycle. Common data governance frameworks relying on people and policies falls short of the overwhelming data complexity. Yet, harnessing this complexity is necessary to achieve high quality standards. The later will condition the outcome of any downstream data usage, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework, that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at enterprise scale, in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built an knowledge graph describing data assets avatars in their business context including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is a prerequisite to automatize data governance, make it fit-for-purpose to each use case and dynamically adapting it to business changes.
MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language
Authors: Conghao Tom Shen, Violet Yao, Yixin Liu
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02083
Pdf link: https://arxiv.org/pdf/2311.02083
Abstract Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision and language to facilitate efficient search of both dialogues and scenes within Manga frames. The architecture of MaRU integrates an object detection model for identifying text and frame bounding boxes, a Vision Encoder-Decoder model for text recognition, a text encoder for embedding text, and a vision-text encoder that merges textual and visual information into a unified embedding space for scene retrieval. Rigorous evaluations reveal that MaRU excels in end-to-end dialogue retrieval and exhibits promising results for scene retrieval.
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking
Authors: Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, Even Oldridge
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.02089
Pdf link: https://arxiv.org/pdf/2311.02089
Abstract Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation. By leveraging textual features, customized LLMs are also applied for recommendation and demonstrate improvements across diverse recommendation scenarios. Yet the majority of existing methods perform training-free recommendation that heavily relies on pretrained knowledge (e.g., movie recommendation). In addition, inference on LLMs is slow due to autoregressive generation, rendering existing methods less effective for real-time recommendation. As such, we propose a two-stage framework using large language models for ranking-based recommendation (LlamaRec). In particular, we use small-scale sequential recommenders to retrieve candidates based on the user interaction history. Then, both history and retrieved items are fed to the LLM in text via a carefully designed prompt template. Instead of generating next-item titles, we adopt a verbalizer-based approach that transforms output logits into probability distributions over the candidate items. Therefore, the proposed LlamaRec can efficiently rank items without generating long text. To validate the effectiveness of the proposed framework, we compare against state-of-the-art baseline methods on benchmark datasets. Our experimental results demonstrate the performance of LlamaRec, which consistently achieves superior performance in both recommendation performance and efficiency.
Efficient Symbolic Policy Learning with Differentiable Symbolic Expression
Authors: Jiaming Guo, Rui Zhang, Shaohui Peng, Qi Yi, Xing Hu, Ruizhi Chen, Zidong Du, Xishan Zhang, Ling Li, Qi Guo, Yunji Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02104
Pdf link: https://arxiv.org/pdf/2311.02104
Abstract Deep reinforcement learning (DRL) has led to a wide range of advances in sequential decision-making tasks. However, the complexity of neural network policies makes it difficult to understand and deploy with limited computational resources. Currently, employing compact symbolic expressions as symbolic policies is a promising strategy to obtain simple and interpretable policies. Previous symbolic policy methods usually involve complex training processes and pre-trained neural network policies, which are inefficient and limit the application of symbolic policies. In this paper, we propose an efficient gradient-based learning method named Efficient Symbolic Policy Learning (ESPL) that learns the symbolic policy from scratch in an end-to-end way. We introduce a symbolic network as the search space and employ a path selector to find the compact symbolic policy. By doing so we represent the policy with a differentiable symbolic expression and train it in an off-policy manner which further improves the efficiency. In addition, in contrast with previous symbolic policies which only work in single-task RL because of complexity, we expand ESPL on meta-RL to generate symbolic policies for unseen tasks. Experimentally, we show that our approach generates symbolic policies with higher performance and greatly improves data efficiency for single-task RL. In meta-RL, we demonstrate that compared with neural network policies the proposed symbolic policy achieves higher performance and efficiency and shows the potential to be interpretable.
Feature Attribution Explanations for Spiking Neural Networks
Authors: Elisa Nguyen, Meike Nauta, Gwenn Englebienne, Christin Seifert
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02110
Pdf link: https://arxiv.org/pdf/2311.02110
Abstract Third-generation artificial neural networks, Spiking Neural Networks (SNNs), can be efficiently implemented on hardware. Their implementation on neuromorphic chips opens a broad range of applications, such as machine learning-based autonomous control and intelligent biomedical devices. In critical applications, however, insight into the reasoning of SNNs is important, thus SNNs need to be equipped with the ability to explain how decisions are reached. We present \textit{Temporal Spike Attribution} (TSA), a local explanation method for SNNs. To compute the explanation, we aggregate all information available in model-internal variables: spike times and model weights. We evaluate TSA on artificial and real-world time series data and measure explanation quality w.r.t. multiple quantitative criteria. We find that TSA correctly identifies a small subset of input features relevant to the decision (i.e., is output-complete and compact) and generates similar explanations for similar inputs (i.e., is continuous). Further, our experiments show that incorporating the notion of \emph{absent} spikes improves explanation quality. Our work can serve as a starting point for explainable SNNs, with future implementations on hardware yielding not only predictions but also explanations in a broad range of application scenarios. Source code is available at https://github.com/ElisaNguyen/tsa-explanations.
PILL: Plug Into LLM with Adapter Expert and Attention Gate
Authors: Fangyuan Zhang, Tingting Liang, Zhengyuan Wu, Yuyu Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02126
Pdf link: https://arxiv.org/pdf/2311.02126
Abstract Due to the remarkable capabilities of powerful Large Language Models (LLMs) in effectively following instructions, there has been a growing number of assistants in the community to assist humans. Recently, significant progress has been made in the development of Vision Language Models (VLMs), expanding the capabilities of LLMs and enabling them to execute more diverse instructions. However, it is foreseeable that models will likely need to handle tasks involving additional modalities such as speech, video, and others. This poses a particularly prominent challenge of dealing with the complexity of mixed modalities. To address this, we introduce a novel architecture called PILL: Plug Into LLM with adapter expert and attention gate to better decouple these complex modalities and leverage efficient fine-tuning. We introduce two modules: Firstly, utilizing Mixture-of-Modality-Adapter-Expert to independently handle different modalities, enabling better adaptation to downstream tasks while preserving the expressive capability of the original model. Secondly, by introducing Modality-Attention-Gating, which enables adaptive control of the contribution of modality tokens to the overall representation. In addition, we have made improvements to the Adapter to enhance its learning and expressive capabilities. Experimental results demonstrate that our approach exhibits competitive performance compared to other mainstream methods for modality fusion. For researchers interested in our work, we provide free access to the code and models at https://github.com/DsaltYfish/PILL.
Resource savings from fault-tolerant circuit design
Authors: Andrew K. Tan, Isaac L. Chuang
Subjects: Computational Engineering, Finance, and Science (cs.CE); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.02132
Pdf link: https://arxiv.org/pdf/2311.02132
Abstract Using fault-tolerant constructions, computations performed with unreliable components can simulate their noiseless counterparts though the introduction of a modest amount of redundancy. Given the modest overhead required to achieve fault-tolerance, and the fact that increasing the reliability of basic components often comes at a cost, are there situations where fault-tolerance may be more economical? We present a general framework to account for this overhead cost in order to effectively compare fault-tolerant to non-fault-tolerant approaches for computation, in the limit of small logical error rates. Using this detailed accounting, we determine explicit boundaries at which fault-tolerant designs become more efficient than designs that achieve comparable reliability through direct consumption of resources. We find that the fault-tolerant construction is always preferred in the limit of high reliability in cases where the resources required to construct a basic unit grows faster than $\log(1 / \epsilon)$ asymptotically for small $\epsilon$.
Sparse Training of Discrete Diffusion Models for Graph Generation
Authors: Yiming Qin, Clement Vignac, Pascal Frossard
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02142
Pdf link: https://arxiv.org/pdf/2311.02142
Abstract Generative models for graphs often encounter scalability challenges due to the inherent need to predict interactions for every node pair. Despite the sparsity often exhibited by real-world graphs, the unpredictable sparsity patterns of their adjacency matrices, stemming from their unordered nature, leads to quadratic computational complexity. In this work, we introduce SparseDiff, a denoising diffusion model for graph generation that is able to exploit sparsity during its training phase. At the core of SparseDiff is a message-passing neural network tailored to predict only a subset of edges during each forward pass. When combined with a sparsity-preserving noise model, this model can efficiently work with edge lists representations of graphs, paving the way for scalability to much larger structures. During the sampling phase, SparseDiff iteratively populates the adjacency matrix from its prior state, ensuring prediction of the full graph while controlling memory utilization. Experimental results show that SparseDiff simultaneously matches state-of-the-art in generation performance on both small and large graphs, highlighting the versatility of our method.
FairSeg: A Large-scale Medical Image Segmentation Dataset for Fairness Learning with Fair Error-Bound Scaling
Authors: Yu Tian, Min Shi, Yan Luo, Ava Kouhana, Tobias Elze, Mengyu Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02189
Pdf link: https://arxiv.org/pdf/2311.02189
Abstract Fairness in artificial intelligence models has gained significantly more attention in recent years, especially in the area of medicine, as fairness in medical models is critical to people's well-being and lives. High-quality medical fairness datasets are needed to promote fairness learning research. Existing medical fairness datasets are all for classification tasks, and no fairness datasets are available for medical segmentation, while medical segmentation is an equally important clinical task as classifications, which can provide detailed spatial information on organ abnormalities ready to be assessed by clinicians. In this paper, we propose the first fairness dataset for medical segmentation named FairSeg with 10,000 subject samples. In addition, we propose a fair error-bound scaling approach to reweight the loss function with the upper error-bound in each identity group. We anticipate that the segmentation performance equity can be improved by explicitly tackling the hard cases with high training errors in each identity group. To facilitate fair comparisons, we propose new equity-scaled segmentation performance metrics, such as the equity-scaled Dice coefficient, which is calculated as the overall Dice coefficient divided by one plus the standard deviation of group Dice coefficients. Through comprehensive experiments, we demonstrate that our fair error-bound scaling approach either has superior or comparable fairness performance to the state-of-the-art fairness learning models. The dataset and code are publicly accessible via \url{https://github.com/Harvard-Ophthalmology-AI-Lab/FairSeg}.
Imitation Bootstrapped Reinforcement Learning
Authors: Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02198
Pdf link: https://arxiv.org/pdf/2311.02198
Abstract Despite the considerable potential of reinforcement learning (RL), robotics control tasks predominantly rely on imitation learning (IL) owing to its better sample efficiency. However, given the high cost of collecting extensive demonstrations, RL is still appealing if it can utilize limited imitation data for efficient autonomous self-improvement. Existing RL methods that utilize demonstrations either initialize the replay buffer with demonstrations and oversample them during RL training, which does not benefit from the generalization potential of modern IL methods, or pretrain the RL policy with IL on the demonstrations, which requires additional mechanisms to prevent catastrophic forgetting during RL fine-tuning. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework that first trains an IL policy on a limited number of demonstrations and then uses it to propose alternative actions for both online exploration and target value bootstrapping. IBRL achieves SoTA performance and sample efficiency on 7 challenging sparse reward continuous control tasks in simulation while learning directly from pixels. As a highlight of our method, IBRL achieves $6.4\times$ higher success rate than RLPD, a strong method that combines the idea of oversampling demonstrations with modern RL improvements, under the budget of 10 demos and 100K interactions in the challenging PickPlaceCan task in the Robomimic benchmark.
Joint Composite Latent Space Bayesian Optimization
Authors: Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, Eytan Bakshy
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02213
Pdf link: https://arxiv.org/pdf/2311.02213
Abstract Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.
Linear difference operators with sequence coefficients having infinite-dimentional solution spaces
Authors: Sergei Abramov, Gleb Pogudin
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2311.02217
Pdf link: https://arxiv.org/pdf/2311.02217
Abstract The notion of lacunary infinite numerical sequence is introduced. It is shown that for an arbitrary linear difference operator L with coefficients belonging to the set R of infinite numerical sequences, a criterion (i.e., a necessary and sufficient condition) for the infinite dimensionality of its space $V_L$ of solutions belonging to R is the presence of a lacunary sequence in $V_L$.
On the dimension of the solution space of linear difference equations over the ring of infinite sequences
Authors: Sergei Abramov, Gleb Pogudin
Subjects: Symbolic Computation (cs.SC)
Arxiv link: https://arxiv.org/abs/2311.02219
Pdf link: https://arxiv.org/pdf/2311.02219
Abstract For a linear difference equation with the coefficients being computable sequences, we establish algorithmic undecidability of the problem of determining the dimension of the solution space including the case when some additional prior information on the dimension is available.
Structured Neural Networks for Density Estimation and Causal Inference
Authors: Asic Q. Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, Rahul G. Krishnan
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.02221
Pdf link: https://arxiv.org/pdf/2311.02221
Abstract Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
State-wise Safe Reinforcement Learning With Pixel Observations
Authors: Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.02227
Pdf link: https://arxiv.org/pdf/2311.02227
Abstract Reinforcement Learning(RL) in the context of safe exploration has long grappled with the challenges of the delicate balance between maximizing rewards and minimizing safety violations, the complexities arising from contact-rich or non-smooth environments, and high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the exploration and learning process, where the agent is prohibited from accessing unsafe regions without prior knowledge, adds an additional layer of complexity. In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through the introduction of a latent barrier function learning mechanism. As a joint learning framework, our approach first involves constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations. Subsequently, we build and learn a latent barrier function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return. Experimental evaluations on the safety-gym benchmark suite demonstrate that our proposed method significantly reduces safety violations throughout the training process and demonstrates faster safety convergence compared to existing methods while achieving competitive results in reward return.
Using DUCK-Net for Polyp Image Segmentation
Authors: Razvan-Gabriel Dumitru, Darius Peteleaza, Catalin Craciun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02239
Pdf link: https://arxiv.org/pdf/2311.02239
Abstract This paper presents a novel supervised convolutional neural network architecture, "DUCK-Net", capable of effectively learning and generalizing from small amounts of medical images to perform accurate segmentation tasks. Our model utilizes an encoder-decoder structure with a residual downsampling mechanism and a custom convolutional block to capture and process image information at multiple resolutions in the encoder segment. We employ data augmentation techniques to enrich the training set, thus increasing our model's performance. While our architecture is versatile and applicable to various segmentation tasks, in this study, we demonstrate its capabilities specifically for polyp segmentation in colonoscopy images. We evaluate the performance of our method on several popular benchmark datasets for polyp segmentation, Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, and ETIS-LARIBPOLYPDB showing that it achieves state-of-the-art results in terms of mean Dice coefficient, Jaccard index, Precision, Recall, and Accuracy. Our approach demonstrates strong generalization capabilities, achieving excellent performance even with limited training data. The code is publicly available on GitHub: https://github.com/RazvanDu/DUCK-Net
Democratic Policy Development using Collective Dialogues and AI
Authors: Andrew Konya, Lisa Schirch, Colin Irwin, Aviv Ovadya
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.02242
Pdf link: https://arxiv.org/pdf/2311.02242
Abstract We design and test an efficient democratic process for developing policies that reflect informed public will. The process combines AI-enabled collective dialogues that make deliberation democratically viable at scale with bridging-based ranking for automated consensus discovery. A GPT4-powered pipeline translates points of consensus into representative policy clauses from which an initial policy is assembled. The initial policy is iteratively refined with the input of experts and the public before a final vote and evaluation. We test the process three times with the US public, developing policy guidelines for AI assistants related to medical advice, vaccine information, and wars & conflicts. We show the process can be run in two weeks with 1500+ participants for around $10,000, and that it generates policy guidelines with strong public support across demographic divides. We measure 75-81% support for the policy guidelines overall, and no less than 70-75% support across demographic splits spanning age, gender, religion, race, education, and political party. Overall, this work demonstrates an end-to-end proof of concept for a process we believe can help AI labs develop common-ground policies, governing bodies break political gridlock, and diplomats accelerate peace deals.
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Authors: Jing Pan, Jian Wu, Yashesh Gaur, Sunit Sivasankaran, Zhuo Chen, Shujie Liu, Jinyu Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.02248
Pdf link: https://arxiv.org/pdf/2311.02248
Abstract We present a data and cost efficient way of incorporating the speech modality into a large language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with Instruction-following/in-context-learning Capabilities - COSMIC. Speech comprehension test question-answer (SQA) pairs are generated using GPT-3.5 based on the speech transcriptions as a part of the supervision for the instruction tuning. With fewer than 20M trainable parameters and as little as 450 hours of English speech data for SQA generation, COSMIC exhibits emergent instruction-following and in-context learning capabilities in speech-to-text tasks. The model is able to follow the given text instructions to generate text response even on the unseen EN$\to$X speech-to-text translation (S2TT) task with zero-shot setting. We evaluate the model's in-context learning via various tasks such as EN$\to$X S2TT and few-shot domain adaptation. And instruction-following capabilities are evaluated through a contextual biasing benchmark. Our results demonstrate the efficacy of the proposed low cost recipe for building a speech LLM and that with the new instruction-tuning data.
Comparative Knowledge Distillation
Authors: Alex Wilf, Alex Tianyi Xu, Paul Pu Liang, Alexander Obolenskiy, Daniel Fried, Louis-Philippe Morency
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02253
Pdf link: https://arxiv.org/pdf/2311.02253
Abstract In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of costly, often proprietary, large scale models. Addressing this gap, our paper considers how to minimize the dependency on teacher model inferences in KD in a setting we term Few Teacher Inference Knowledge Distillation (FTI KD). We observe that prevalent KD techniques and state of the art data augmentation strategies fall short in this constrained setting. Drawing inspiration from educational principles that emphasize learning through comparison, we propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples. Critically, CKD provides additional learning signals to the student without making additional teacher calls. We also extend the principle of CKD to groups of samples, enabling even more efficient learning from limited teacher calls. Empirical evaluation across varied experimental settings indicates that CKD consistently outperforms state of the art data augmentation and KD techniques.
Not all layers are equally as important: Every Layer Counts BERT
Authors: Lucas Georges Gabriel Charpentier, David Samuel
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.02265
Pdf link: https://arxiv.org/pdf/2311.02265
Abstract This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the \textsc{strict} and \textsc{strict-small} tracks. Our approach allows each transformer layer to select which outputs of previous layers to process. The empirical results verify the potential of this simple modification and show that not all layers are equally as important.
Contrastive Multi-Modal Representation Learning for Spark Plug Fault Diagnosis
Authors: Ardavan Modarres, Vahid Mohammad-Zadeh Eivaghi, Mahdi Aliyari Shoorehdeli, Ashkan Moosavian
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.02282
Pdf link: https://arxiv.org/pdf/2311.02282
Abstract Due to the incapability of one sensory measurement to provide enough information for condition monitoring of some complex engineered industrial mechanisms and also for overcoming the misleading noise of a single sensor, multiple sensors are installed to improve the condition monitoring of some industrial equipment. Therefore, an efficient data fusion strategy is demanded. In this research, we presented a Denoising Multi-Modal Autoencoder with a unique training strategy based on contrastive learning paradigm, both being utilized for the first time in the machine health monitoring realm. The presented approach, which leverages the merits of both supervised and unsupervised learning, not only achieves excellent performance in fusing multiple modalities (or views) of data into an enriched common representation but also takes data fusion to the next level wherein one of the views can be omitted during inference time with very slight performance reduction, or even without any reduction at all. The presented methodology enables multi-modal fault diagnosis systems to perform more robustly in case of sensor failure occurrence, and one can also intentionally omit one of the sensors (the more expensive one) in order to build a more cost-effective condition monitoring system without sacrificing performance for practical purposes. The effectiveness of the presented methodology is examined on a real-world private multi-modal dataset gathered under non-laboratory conditions from a complex engineered mechanism, an inline four-stroke spark-ignition engine, aiming for spark plug fault diagnosis. This dataset, which contains the accelerometer and acoustic signals as two modalities, has a very slight amount of fault, and achieving good performance on such a dataset promises that the presented method can perform well on other equipment as well.
OverHear: Headphone based Multi-sensor Keystroke Inference
Authors: Raveen Wijewickrama, Maryam Abbasihafshejani, Anindya Maiti, Murtuza Jadliwala
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.02288
Pdf link: https://arxiv.org/pdf/2311.02288
Abstract Headphones, traditionally limited to audio playback, have evolved to integrate sensors like high-definition microphones and accelerometers. While these advancements enhance user experience, they also introduce potential eavesdropping vulnerabilities, with keystroke inference being our concern in this work. To validate this threat, we developed OverHear, a keystroke inference framework that leverages both acoustic and accelerometer data from headphones. The accelerometer data, while not sufficiently detailed for individual keystroke identification, aids in clustering key presses by hand position. Concurrently, the acoustic data undergoes analysis to extract Mel Frequency Cepstral Coefficients (MFCC), aiding in distinguishing between different keystrokes. These features feed into machine learning models for keystroke prediction, with results further refined via dictionary-based word prediction methods. In our experimental setup, we tested various keyboard types under different environmental conditions. We were able to achieve top-5 key prediction accuracy of around 80% for mechanical keyboards and around 60% for membrane keyboards with top-100 word prediction accuracies over 70% for all keyboard types. The results highlight the effectiveness and limitations of our approach in the context of real-world scenarios.
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02303
Pdf link: https://arxiv.org/pdf/2311.02303
Abstract Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}
Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion
Authors: Donghoon Youm, Hyunyoung Jung, Hyeongjun Kim, Jemin Hwangbo, Hae-Won Park, Sehoon Ha
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02304
Pdf link: https://arxiv.org/pdf/2311.02304
Abstract Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains. By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping.
Thermal Face Image Classification using Deep Learning Techniques
Authors: Prosenjit Chatterjee, ANK Zaman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.02314
Pdf link: https://arxiv.org/pdf/2311.02314
Abstract Thermal images have various applications in security, medical and industrial domains. This paper proposes a practical deep-learning approach for thermal image classification. Accurate and efficient classification of thermal images poses a significant challenge across various fields due to the complex image content and the scarcity of annotated datasets. This work uses a convolutional neural network (CNN) architecture, specifically ResNet-50 and VGGNet-19, to extract features from thermal images. This work also applied Kalman filter on thermal input images for image denoising. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and efficiency.
An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations
Authors: Valentin Duruisseaux, Amit Chakraborty
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02328
Pdf link: https://arxiv.org/pdf/2311.02328
Abstract In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning techniques for super-resolution, to reconstruct high-resolution numerical solutions from low-resolution simulations which can be obtained more efficiently. The proposed approach, the Super Resolution Operator Network (SROpNet), frames super-resolution as an operator learning problem and draws inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location. In addition, no restrictions are imposed on the locations of (the fixed number of) spatiotemporal sensors at which the low-resolution approximations are provided, thereby enabling the consideration of a broader spectrum of problems arising in practice, for which many existing super-resolution approaches are not well-suited.
NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation
Authors: Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, Ding Li
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.02331
Pdf link: https://arxiv.org/pdf/2311.02331
Abstract Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance-based detection systems that detect and investigate APT attacks under the constraints of timeliness and limited resources are in dire need. Unfortunately, existing online systems usually sacrifice detection granularity to reduce computational complexity and produce provenance graphs with more than 100,000 nodes, posing challenges for security admins to interpret the detection results. In this paper, we design and implement NodLink, the first online detection system that maintains high detection accuracy without sacrificing detection granularity. Our insight is that the APT attack detection process in online provenance-based detection systems can be modeled as a Steiner Tree Problem (STP), which has efficient online approximation algorithms that recover concise attack-related provenance graphs with a theoretically bounded error. To utilize STP approximation algorithm frameworks for APT attack detection, we propose a novel design of in-memory cache, an efficient attack screening method, and a new STP approximation algorithm that is more efficient than the conventional one in APT attack detection while maintaining the same complexity. We evaluate NodLink in a production environment. The open-world experiment shows that NodLink outperforms two state-of-the-art (SOTA) online provenance analysis systems by achieving magnitudes higher detection and investigation accuracy while having the same or higher throughput.
Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision
Authors: Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, Vaneet Aggarwal
Subjects: Machine Learning (cs.LG); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2311.02333
Pdf link: https://arxiv.org/pdf/2311.02333
Abstract This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.
STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots
Authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02337
Pdf link: https://arxiv.org/pdf/2311.02337
Abstract Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.
A Comprehensive Dynamic Simulation Framework for Coupled Neuromusculoskeletal-Exoskeletal Systems
Authors: Wei Jin, Jiaqi Liu, Qiwei Zhang, Xiaoxu Zhang, Qining Wang, Hongbin Fang, Jian Xu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02346
Pdf link: https://arxiv.org/pdf/2311.02346
Abstract The modeling and simulation of coupled neuromusculoskeletal-exoskeletal systems play a crucial role in human biomechanical analysis, as well as in the design and control of exoskeletons. However, conventional dynamic simulation frameworks have limitations due to their reliance on experimental data and their inability to capture comprehensive biomechanical signals and dynamic responses. To address these challenges, we introduce an optimization-based dynamic simulation framework that integrates a complete neuromusculoskeletal feedback loop, rigid-body dynamics, human-exoskeleton interaction, and foot-ground contact. Without relying on experimental measurements or empirical data, our framework employs a stepwise optimization process to determine muscle reflex parameters, taking into account multidimensional criteria. This allows the framework to generate a full range of kinematic and biomechanical signals, including muscle activations, muscle forces, joint torques, etc., which are typically challenging to measure experimentally. To validate the effectiveness of the framework, we compare the simulated results with experimental data obtained from a healthy subject wearing an exoskeleton while walking at different speeds (0.9, 1.0, and 1.1 m/s) and terrains (flat and uphill). The results demonstrate that our framework can effectively and accurately capture the qualitative differences in muscle activity associated with different functions, as well as the evolutionary patterns of muscle activity and kinematic signals under varying walking conditions. The simulation framework we propose has the potential to facilitate gait analysis and performance evaluation of coupled human-exoskeleton systems, as well as enable efficient and cost-effective testing of novel exoskeleton designs and control strategies.
MATA: Combining Learnable Node Matching with A Algorithm for Approximate Graph Edit Distance Computation
Authors: Junfeng Liu, Min Zhou, Shuai Ma, Lujia Pan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02356
Pdf link: https://arxiv.org/pdf/2311.02356
Abstract Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be NP-complete. For instance, the widely used A algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA for approximate GED computation based on Graph Neural Networks (GNNs) and A algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e.,node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-k candidate nodes are produced via a differentiable top-k operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models
Authors: Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02379
Pdf link: https://arxiv.org/pdf/2311.02379
Abstract Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.
Ultra-Long Sequence Distributed Transformer
Authors: Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02382
Pdf link: https://arxiv.org/pdf/2311.02382
Abstract Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.
The Case of Transparent Cache Invalidation in Web Applications
Authors: Yunhong Ji, Xuan Zhou, Yongluan Zhou, Ke Wang
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2311.02384
Pdf link: https://arxiv.org/pdf/2311.02384
Abstract Application-level caches are widely adopted by web applications to minimize the response time of user requests as well as to reduce the burden on the system backend, such as the database servers. In the state of practice, developers have to take care of the data freshness of application-level caches manually. Given the growing complexities of today's web applications, it becomes increasingly challenging for developers to understand, reason about, and implement cache invalidation methods. Furthermore, according to our survey of open-source web application projects and engineers, it is indeed challenging to map database updates with cache entries at the application level. Therefore, we propose a design to handle data validity in a transparent and precise manner, without requiring any intervention from developers. Its main idea is to modify the DBMS to provide necessary information for cache management and enhance the cache with an invalidation index to identify and invalidate outdated data automatically and efficiently. Based on the design, we further provide two specific solutions. Our preliminary experiments indicate that our solutions could effectively achieve transparent cache invalidation while maintaining cost-effectiveness.
CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models
Authors: Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.02398
Pdf link: https://arxiv.org/pdf/2311.02398
Abstract Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.
Numerical Recovery of a Time-Dependent Potential in Subdiffusion
Authors: Bangti Jin, Kwancheol Shin, Zhi Zhou
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2311.02420
Pdf link: https://arxiv.org/pdf/2311.02420
Abstract In this work we investigate an inverse problem of recovering a time-dependent potential in a semilinear subdiffusion model from an integral measurement of the solution over the domain. The model involves the Djrbashian--Caputo fractional derivative in time. Theoretically, we prove a novel conditional Lipschitz stability result, and numerically, we develop an easy-to-implement fixed point iteration for recovering the unknown coefficient. In addition, we establish rigorous error bounds on the discrete approximation. These results are obtained by crucially using smoothing properties of the solution operators and suitable choice of a weighted $L^p(0,T)$ norm. The efficiency and accuracy of the scheme are showcased on several numerical experiments in one- and two-dimensions.
Succinct Data Structure for Graphs with $d$-Dimensional $t$-Representation
Authors: Girish Balakrishnan, Sankardeep Chakraborty, Seungbum Jo, N S Narayanaswamy, Kunihiko Sadakane
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.02427
Pdf link: https://arxiv.org/pdf/2311.02427
Abstract Erd\H{o}s and West (Discrete Mathematics'85) considered the class of $n$ vertex intersection graphs which have a {\em $d$-dimensional} {\em $t$-representation}, that is, each vertex of a graph in the class has an associated set consisting of at most $t$ $d$-dimensional axis-parallel boxes. In particular, for a graph $G$ and for each $d \geq 1$, they consider $i_d(G)$ to be the minimum $t$ for which $G$ has such a representation. For fixed $t$ and $d$, they consider the class of $n$ vertex labeled graphs for which $id(G) \leq t$, and prove an upper bound of $(2nt+\frac{1}{2})d \log n - (n - \frac{1}{2})d \log(4\pi t)$ on the logarithm of size of the class. In this work, for fixed $t$ and $d$ we consider the class of $n$ vertex unlabeled graphs which have a {\em $d$-dimensional $t$-representation}, denoted by $\mathcal{G}{t,d}$. We address the problem of designing a succinct data structure for the class $\mathcal{G}{t,d}$ in an attempt to generalize the relatively recent results on succinct data structures for interval graphs (Algorithmica'21). To this end, for each $n$ such that $td^2$ is in $o(n / \log n)$, we first prove a lower bound of $(2dt-1)n \log n - O(ndt \log \log n)$-bits on the size of any data structure for encoding an arbitrary graph that belongs to $\mathcal{G}{t,d}$. We then present a $((2dt-1)n \log n + dt\log t + o(ndt \log n))$-bit data structure for $\mathcal{G}{t,d}$ that supports navigational queries efficiently. Contrasting this data structure with our lower bound argument, we show that for each fixed $t$ and $d$, and for all $n \geq 0$ when $td^2$ is in $o(n/\log n)$ our data structure for $\mathcal{G}{t,d}$ is succinct. As a byproduct, we also obtain succinct data structures for graphs of bounded boxicity (denoted by $d$ and $t = 1$) and graphs of bounded interval number (denoted by $t$ and $d=1$) when $td^2$ is in $o(n/\log n)$.
P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification
Authors: Abid Ali, Ashish Marisetty, Francois Bremond
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02432
Pdf link: https://arxiv.org/pdf/2311.02432
Abstract Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based methods for age classification. Our novel two-stream architecture uses TimeSformer and EfficientNet as backbones, to effectively capture both facial and body dynamics information for efficient and accurate age estimation in videos. Furthermore, to fill the gap in predicting age in real-world situations from videos, we construct a video dataset called Pexels Age (P-Age) for age classification. The proposed method achieves superior results compared to existing face-based age estimation methods and is evaluated in situations where the face is highly occluded, blurred, or masked. The method is also cross-tested on a variety of challenging video datasets such as Charades, Smarthome, and Thumos-14.
SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling
Authors: Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir, Cristian Sminchisescu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02461
Pdf link: https://arxiv.org/pdf/2311.02461
Abstract We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.
UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition
Authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2311.02523
Pdf link: https://arxiv.org/pdf/2311.02523
Abstract Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available.
QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing
Authors: Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02525
Pdf link: https://arxiv.org/pdf/2311.02525
Abstract In the realm of mobile edge computing (MEC), efficient computation task offloading plays a pivotal role in ensuring a seamless quality of experience (QoE) for users. Maintaining a high QoE is paramount in today's interconnected world, where users demand responsive and reliable services. This challenge stands as one of the most primary key factors contributing to handling dynamic and uncertain mobile environment. In this study, we delve into computation offloading in MEC systems, where strict task processing deadlines and energy constraints can adversely affect the system performance. We formulate the computation task offloading problem as a Markov decision process (MDP) to maximize the long-term QoE of each user individually. We propose a decentralized QoE-oriented computation offloading (QOCO) algorithm based on deep reinforcement learning (DRL) that empowers mobile devices to make their offloading decisions without requiring knowledge of decisions made by other devices. Through numerical studies, we evaluate the performance of QOCO. Simulation results validate that the QOCO algorithm efficiently exploits the computational resources of edge nodes. Consequently, it can complete 14% more tasks and reduce task delay and energy consumption by 9% and 6%, respectively. These together contribute to a significant improvement of at least 37% in average QoE compared to an existing algorithm.
Contract Design With Safety Inspections
Authors: Alireza Fallah, Michael I. Jordan
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Arxiv link: https://arxiv.org/abs/2311.02537
Pdf link: https://arxiv.org/pdf/2311.02537
Abstract We study the role of regulatory inspections in a contract design problem in which a principal interacts separately with multiple agents. Each agent's hidden action includes a dimension that determines whether they undertake an extra costly step to adhere to safety protocols. The principal's objective is to use payments combined with a limited budget for random inspections to incentivize agents towards safety-compliant actions that maximize the principal's utility. We first focus on the single-agent setting with linear contracts and present an efficient algorithm that characterizes the optimal linear contract, which includes both payment and random inspection. We further investigate how the optimal contract changes as the inspection cost or the cost of adhering to safety protocols vary. Notably, we demonstrate that the agent's compensation increases if either of these costs escalates. However, while the probability of inspection decreases with rising inspection costs, it demonstrates nonmonotonic behavior as a function of the safety action costs. Lastly, we explore the multi-agent setting, where the principal's challenge is to determine the best distribution of inspection budgets among all agents. We propose an efficient approach based on dynamic programming to find an approximately optimal allocation of inspection budget across contracts. We also design a random sequential scheme to determine the inspector's assignments, ensuring each agent is inspected at most once and at the desired probability. Finally, we present a case study illustrating that a mere difference in the cost of inspection across various agents can drive the principal's decision to forego inspecting a significant fraction of them, concentrating its entire budget on those that are less costly to inspect.
VR-NeRF: High-Fidelity Virtualized Walkable Spaces
Authors: Linning Xu, Vasu Agrawal, William Laney, Tony Garcia, Aayush Bansal, Changil Kim, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Aljaž Božič, Dahua Lin, Michael Zollhöfer, Christian Richardt
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.02542
Pdf link: https://arxiv.org/pdf/2311.02542
Abstract We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields. To this end, we designed and built a custom multi-camera rig to densely capture walkable spaces in high fidelity and with multi-view high dynamic range images in unprecedented quality and density. We extend instant neural graphics primitives with a novel perceptual color space for learning accurate HDR appearance, and an efficient mip-mapping mechanism for level-of-detail rendering with anti-aliasing, while carefully optimizing the trade-off between quality and speed. Our multi-GPU renderer enables high-fidelity volume rendering of our neural radiance field model at the full VR resolution of dual 2K$\times$2K at 36 Hz on our custom demo machine. We demonstrate the quality of our results on our challenging high-fidelity datasets, and compare our method and datasets to existing baselines. We release our dataset on our project website.
Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks
Authors: Haide Wang, Ji Zhou, Qingxin Lu, Jianrui Zeng, Yongqing Liao, Weiping Liu, Changyuan Yu, Zhaohui Li
Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.02554
Pdf link: https://arxiv.org/pdf/2311.02554
Abstract The security issues of passive optical networks (PONs) have always been a concern due to broadcast transmission. Physical-layer security enhancement for the coherent PON should be as significant as improving transmission performance. In this paper, we propose the advanced encryption standard (AES) algorithm and geometric constellation shaping four-level pulse amplitude modulation (GCS-PAM4) pilot-based key distribution for secure coherent PON. The first bit of the GCS-PAM4 pilot is used for the hardware-efficient carrier phase recovery (CPR), while the second bit is utilized for key distribution without occupying the additional overhead. The key bits are encoded by the polar code to ensure error-free distribution. Frequent key updates are permitted for every codeword to improve the security of coherent PON. The experimental results of the 200-Gbps secure coherent PON using digital subcarrier multiplexing show that the GCS-PAM4 pilot-based key distribution could be error-free at upstream transmission without occupying the additional overhead and the eavesdropping would be prevented by AES algorithm at downstream transmission. Moreover, there is almost no performance penalty on the CPR using the GCS-PAM4 pilot compared to the binary phase shift keying pilot.
Ego-Network Transformer for Subsequence Classification in Time Series Data
Authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02561
Pdf link: https://arxiv.org/pdf/2311.02561
Abstract Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequences. Successfully classifying these relevant subsequences requires not only distinguishing between different classes but also accurately identifying the foreground subsequences amidst the background. To address this challenge, we propose a novel subsequence classification method that represents each subsequence as an ego-network, providing crucial nearest neighbor information to the model. The ego-networks of all subsequences collectively form a time series subsequence graph, and we introduce an algorithm to efficiently construct this graph. Furthermore, we have demonstrated the significance of enforcing temporal consistency in the prediction of adjacent subsequences for the subsequence classification problem. To evaluate the effectiveness of our approach, we conducted experiments using 128 univariate and 30 multivariate time series datasets. The experimental results demonstrate the superior performance of our method compared to alternative approaches. Specifically, our method outperforms the baseline on 104 out of 158 datasets.
Group Testing for Accurate and Efficient Range-Based Near Neighbor Search : An Adaptive Binary Splitting Approach
Authors: Kashish Mittal, Harsh Shah, Ajit Rajwade
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02573
Pdf link: https://arxiv.org/pdf/2311.02573
Abstract This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem. The proposed method detects high-similarity vectors from an extensive collection of high dimensional vectors, where each vector represents an image descriptor. Our method efficiently marks each item in the collection as neighbor or non-neighbor on the basis of a cosine distance threshold without exhaustive search. Like other methods in the domain of large scale retrieval, our approach exploits the assumption that most of the items in the collection are unrelated to the query. Unlike other methods, it does not assume a large difference between the cosine similarity of the query vector with the least related neighbor and that with the least unrelated non-neighbor. Following the procedure of binary splitting, a multi-stage adaptive group testing algorithm, we split the set of items to be searched into half at each step, and perform dot product tests on smaller and smaller subsets, many of which we are able to prune away. We experimentally show that our method achieves a speed-up over exhaustive search by a factor of more than ten with an accuracy same as that of exhaustive search, on a variety of large datasets. We present a theoretical analysis of the expected number of distance computations per query and the probability that a pool with a certain number of members will be pruned. In this way, our method exploits very useful and practical distributional properties unlike other methods. In our method, all required data structures are created purely offline. Moreover, our method does not impose any strong assumptions on the number of true near neighbors, is adaptible to streaming settings where new vectors are dynamically added to the database, and does not require any parameter tuning.
AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios
Authors: Daksh Dave, Gauransh Sawhney, Dhruv Khut, Sahil Nawale, Pushkar Aggrawal, Prasenjit Bhavathankar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.02621
Pdf link: https://arxiv.org/pdf/2311.02621
Abstract Artificial intelligence operations (AIOps) play a pivotal role in identifying, mitigating, and analyzing anomalous system behaviors and alerts. However, the research landscape in this field remains limited, leaving significant gaps unexplored. This study introduces a novel hybrid framework through an innovative algorithm that incorporates an unsupervised strategy. This strategy integrates Principal Component Analysis (PCA) and Artificial Neural Networks (ANNs) and uses a custom loss function to substantially enhance the effectiveness of log anomaly detection. The proposed approach encompasses the utilization of both simulated and real-world datasets, including logs from SockShop and Hadoop Distributed File System (HDFS). The experimental results are highly promising, demonstrating significant reductions in pseudo-positives. Moreover, this strategy offers notable advantages, such as the ability to process logs in their raw, unprocessed form, and the potential for further enhancements. The successful implementation of this approach showcases a remarkable reduction in anomalous logs, thus unequivocally establishing the efficacy of the proposed methodology. Ultimately, this study makes a substantial contribution to the advancement of log anomaly detection within AIOps platforms, addressing the critical need for effective and efficient log analysis in modern and complex systems.
Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation
Authors: Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02640
Pdf link: https://arxiv.org/pdf/2311.02640
Abstract This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.
PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation
Authors: Sahil Nawale, Dhruv Khut, Daksh Dave, Gauransh Sawhney, Pushkar Aggrawal, Dr. Kailas Devadakar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02641
Pdf link: https://arxiv.org/pdf/2311.02641
Abstract Pothole detection is crucial for road safety and maintenance, traditionally relying on 2D image segmentation. However, existing 3D Semantic Pothole Segmentation research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy. Our research presents an innovative point cloud-based pothole segmentation architecture. Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics, improving feature presentation. We introduce a local relationship learning module to understand local shape relationships, enhancing structural insights. Additionally, we propose a lightweight adaptive structure for refining local point features using the K nearest neighbor algorithm, addressing point cloud density differences and domain selection. Shared MLP Pooling is integrated to learn deep aggregation features, facilitating semantic data exploration and segmentation guidance. Extensive experiments on three public datasets confirm PotholeGuard's superior performance over state-of-the-art methods. Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.
Compute at Scale -- A Broad Investigation into the Data Center Industry
Authors: Konstantin Pilz, Lennart Heim
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02651
Pdf link: https://arxiv.org/pdf/2311.02651
Abstract This report characterizes the data center industry and its importance for AI development. Data centers are industrial facilities that efficiently provide compute at scale and thus constitute the engine rooms of today's digital economy. As large-scale AI training and inference become increasingly computationally expensive, they are dominantly executed from this designated infrastructure. Key features of data centers include large-scale compute clusters that require extensive cooling and consume large amounts of power, the need for fast connectivity both within the data center and to the internet, and an emphasis on security and reliability. The global industry is valued at approximately $250B and is expected to double over the next seven years. There are likely about 500 large (above 10 MW) data centers globally, with the US, Europe, and China constituting the most important markets. The report further covers important actors, business models, main inputs, and typical locations of data centers.
Patterned non-determinism in communication complexity
Authors: Dmytro Gavinsky
Subjects: Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2311.02659
Pdf link: https://arxiv.org/pdf/2311.02659
Abstract We define and study the model of patterned non-determinism in bipartite communication complexity, denoted by $PNP^{X\leftrightarrow Y}$. It generalises the known models $UP^{X\leftrightarrow Y}$ and $FewP^{X\leftrightarrow Y}$ through relaxing the constraints on the witnessing structure of the underlying $NP^{X\leftrightarrow Y}$-protocol. It is shown that for the case of total functions $PNP^{X\leftrightarrow Y}$ equals $P^{X\leftrightarrow Y}$ (similarly to $UP^{X\leftrightarrow Y}$ and $FewP^{X\leftrightarrow Y}$). Moreover, the corresponding exhaustive witness-searching problem -- determining the full set of witnesses that lead to the acceptance of a given input pair -- also has an efficient deterministic protocol. The possibility of efficient exhaustive $PNP^{X\leftrightarrow Y}$-search is used to analyse certain three-party communication regime (under the "number in hand" input partition): The corresponding three-party model is shown to be as strong qualitatively as the weakest among its two-party amplifications obtained by allowing free communication between a pair of players.
CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning
Authors: Azin Jahedi, Maximilian Luz, Marc Rivinius, Andrés Bruhn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02661
Pdf link: https://arxiv.org/pdf/2311.02661
Abstract Attention-based motion aggregation concepts have recently shown their usefulness in optical flow estimation, in particular when it comes to handling occluded regions. However, due to their complexity, such concepts have been mainly restricted to coarse-resolution single-scale approaches that fail to provide the detailed outcome of high-resolution multi-scale networks. In this paper, we hence propose CCMR: a high-resolution coarse-to-fine approach that leverages attention-based motion grouping concepts to multi-scale optical flow estimation. CCMR relies on a hierarchical two-step attention-based context-motion grouping strategy that first computes global multi-scale context features and then uses them to guide the actual motion grouping. As we iterate both steps over all coarse-to-fine scales, we adapt cross covariance image transformers to allow for an efficient realization while maintaining scale-dependent properties. Experiments and ablations demonstrate that our efforts of combining multi-scale and attention-based concepts pay off. By providing highly detailed flow fields with strong improvements in both occluded and non-occluded regions, our CCMR approach not only outperforms both the corresponding single-scale attention-based and multi-scale attention-free baselines by up to 23.0% and 21.6%, respectively, it also achieves state-of-the-art results, ranking first on KITTI 2015 and second on MPI Sintel Clean and Final. Code and trained models are available at https://github.com/cv-stuttgart /CCMR.
Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration
Authors: Archith Athrey, Othmane Mazhar, Meichen Guo, Bart De Schutter, Shengling Shi
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02679
Pdf link: https://arxiv.org/pdf/2311.02679
Abstract In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop' setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE.
Nepali Video Captioning using CNN-RNN Architecture
Authors: Bipesh Subedi, Saugat Singh, Bal Krishna Bal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02699
Pdf link: https://arxiv.org/pdf/2311.02699
Abstract This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Translate, the study trains various CNN-RNN architectures. The research explores the effectiveness of CNNs (e.g., EfficientNetB0, ResNet101, VGG16) paired with different RNN decoders like LSTM, GRU, and BiLSTM. Evaluation involves BLEU and METEOR metrics, with the best model being EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score of 17 and METEOR score of 46. The article also outlines challenges and future directions for advancing Nepali video captioning, offering a crucial resource for further research in this area.
Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
Authors: Arun Verma, Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.02715
Pdf link: https://arxiv.org/pdf/2311.02715
Abstract We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback). In this paper, we first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret. We then characterize the regret reduction in terms of the correlation coefficient between reward and its auxiliary feedback. Experimental results in different settings also verify the performance gain achieved by our proposed method.
M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs
Authors: Yuzong Chen, Jordan Dotzel, Mohamed S. Abdelfattah
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2311.02758
Pdf link: https://arxiv.org/pdf/2311.02758
Abstract Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs given the current FPGA architecture and conventional accelerator dataflows. In this work, we enhance the FPGA's capability for accelerating mixed-precision DNNs by proposing M4BRAM, a novel compute-in-block RAM (BRAM) architecture that can compute mixed-precision matrix-matrix multiplication. On the precision side, M4BRAM supports a wide range of mixed-precision DNN configurations -- the weight precision can be 2/4/8 bits while the activation precision can vary from 2 to 8 bits. On the dataflow side, M4BRAM leverages a novel in-BRAM data duplication scheme to achieve high hardware utilization. Moreover, during M4BRAM computation, other FPGA resources can seamlessly access its data without the need for a separate buffer. Hence, unlike prior compute-in-BRAM proposals, M4BRAM can simultaneously perform mixed-precision computation and maintain full functionality as a memory unit to \textit{truly} complement the existing compute resources on FPGAs. Experiments show that adding M4BRAM to a tiled DNN accelerator can achieve an average speedup of 2.16$\times$ across various DNNs on the ImageNet classification task while incurring a negligible accuracy loss of $<$ 0.5%. Compared to the same tiled accelerator that employs a prior compute-in-BRAM architecture, M4BRAM delivers 1.43$\times$ higher performance on average across various DNNs.
One-Shot Strategic Classification Under Unknown Costs
Authors: Elan Rosenfeld, Nir Rosenfeld
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.02761
Pdf link: https://arxiv.org/pdf/2311.02761
Abstract A primary goal in strategic classification is to learn decision rules which are robust to strategic input manipulation. Earlier works assume that strategic responses are known; while some recent works address the important challenge of unknown responses, they exclusively study sequential settings which allow multiple model deployments over time. But there are many domains$\unicode{x2014}$particularly in public policy, a common motivating use-case$\unicode{x2014}$where multiple deployments are unrealistic, or where even a single bad round is undesirable. To address this gap, we initiate the study of strategic classification under unknown responses in the one-shot setting, which requires committing to a single classifier once. Focusing on the users' cost function as the source of uncertainty, we begin by proving that for a broad class of costs, even a small mis-estimation of the true cost can entail arbitrarily low accuracy in the worst case. In light of this, we frame the one-shot task as a minimax problem, with the goal of identifying the classifier with the smallest worst-case risk over an uncertainty set of possible costs. Our main contribution is efficient algorithms for both the full-batch and stochastic settings, which we prove converge (offline) to the minimax optimal solution at the dimension-independent rate of $\tilde{\mathcal{O}}(T^{-\frac{1}{2}})$. Our analysis reveals important structure stemming from the strategic nature of user responses, particularly the importance of dual norm regularization with respect to the cost function.
Riemannian Laplace Approximation with the Fisher Metric
Authors: Hanlin Yu, Marcelo Hartmann, Bernardo Williams, Mark Girolami, Arto Klami
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.02766
Pdf link: https://arxiv.org/pdf/2311.02766
Abstract The Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties heavily depend on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.
MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis
Authors: Xuqian Ren, Wenjia Wang, Dingding Cai, Tuuli Tuominen, Juho Kannala, Esa Rahtu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02778
Pdf link: https://arxiv.org/pdf/2311.02778
Abstract Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, first, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis, instead of treating them as separate tasks, making them ideal for real-world applications. Second, we benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Finally, in order to further improve the overall performance, we propose a new method that achieves a good trade-off between the two tasks. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion.
Last fall degree of semi-local polynomial systems
Authors: Ming-Deh A. Huang
Subjects: Computational Complexity (cs.CC); Number Theory (math.NT)
Arxiv link: https://arxiv.org/abs/2311.02804
Pdf link: https://arxiv.org/pdf/2311.02804
Abstract We study the last fall degrees of {\em semi-local} polynomial systems, and the computational complexity of solving such systems for closed-point and rational-point solutions, where the systems are defined over a finite field. A semi-local polynomial system specifies an algebraic set which is the image of a global linear transformation of a direct product of local affine algebraic sets. As a special but interesting case, polynomial systems that arise from Weil restriction of algebraic sets in an affine space of low dimension are semi-local. Such systems have received considerable attention due to their application in cryptography. Our main results bound the last fall degree of a semi-local polynomial system in terms of the number of closed point solutions, and yield an efficient algorithm for finding all rational-point solutions when the prime characteristic of the finite field and the number of rational solutions are small. Our results on solving semi-local systems imply an improvement on a previously known polynomial-time attack on the HFE (Hidden Field Equations) cryptosystems. The attacks implied in our results extend to public key encryption functions which are based on semi-local systems where either the number of closed point solutions is small, or the characteristic of the field is small. It remains plausible to construct public key cryptosystems based on semi-local systems over a finite field of large prime characteristic with exponential number of closed point solutions. Such a method is presented in the paper, followed by further cryptanalysis involving the isomorphism of polynomials (IP) problem, as well as a concrete public key encryption scheme which is secure against all the attacks discussed in this paper.
Contour Algorithm for Connectivity
Authors: hihui Du, Oliver Alvarado Rodriguez, Fuhuan Li, Mohammad Dindoost, David A. Bader
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.02811
Pdf link: https://arxiv.org/pdf/2311.02811
Abstract Finding connected components in a graph is a fundamental problem in graph analysis. In this work, we present a novel minimum-mapping based Contour algorithm to efficiently solve the connectivity problem. We prove that the Contour algorithm with two or higher order operators can identify all connected components of an undirected graph within $\mathcal{O}(\log d{max})$ iterations, with each iteration involving $\mathcal{O}(m)$ work, where $d{max}$ represents the largest diameter among all components in the given graph, and $m$ is the total number of edges in the graph. Importantly, each iteration is highly parallelizable, making use of the efficient minimum-mapping operator applied to all edges. To further enhance its practical performance, we optimize the Contour algorithm through asynchronous updates, early convergence checking, eliminating atomic operations, and choosing more efficient mapping operators. Our implementation of the Contour algorithm has been integrated into the open-source framework Arachne. Arachne extends Arkouda for large-scale interactive graph analytics, providing a Python API powered by the high-productivity parallel language Chapel. Experimental results on both real-world and synthetic graphs demonstrate the superior performance of our proposed Contour algorithm compared to state-of-the-art large-scale parallel algorithm FastSV and the fastest shared memory algorithm ConnectIt. On average, Contour achieves a speedup of 7.3x and 1.4x compared to FastSV and ConnectIt, respectively. All code for the Contour algorithm and the Arachne framework is publicly available on GitHub ( https://github.com/Bears-R-Us/arkouda-njit ), ensuring transparency and reproducibility of our work.
CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval
Authors: Yinqiong Cai, Yixing Fan, Keping Bi, Jiafeng Guo, Wei Chen, Ruqing Zhang, Xueqi Cheng
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.02834
Pdf link: https://arxiv.org/pdf/2311.02834
Abstract The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care which component model is more suitable to an instance during training. Thus, they cannot fully exploit the capabilities of different types of retrieval models in identifying diverse relevance patterns. Motivated by this observation, in this paper, we propose a Mixture-of-Experts (MoE) model consisting of representative matching experts and a novel competitive learning mechanism to let the experts develop and enhance their expertise during training. Specifically, our MoE model shares the bottom layers to learn common semantic representations and uses differently structured upper layers to represent various types of retrieval experts. Our competitive learning mechanism has two stages: (1) a standardized learning stage to train the experts equally to develop their capabilities to conduct relevance matching; (2) a specialized learning stage where the experts compete with each other on every training instance and get rewards and updates according to their performance to enhance their expertise on certain types of samples. Experimental results on three retrieval benchmark datasets show that our method significantly outperforms the state-of-the-art baselines.
Cell-Probe Lower Bound for Accessible Interval Graphs
Authors: Sankardeep Chakraborty, Christian Engels, Seungbum Jo, Mingmou Liu
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.02839
Pdf link: https://arxiv.org/pdf/2311.02839
Abstract We spot a hole in the area of succinct data structures for graph classes from a universe of size at most $n^n$. Very often, the input graph is labeled by the user in an arbitrary and easy-to-use way, and the data structure for the graph relabels the input graph in some way. For any access, the user needs to store these labels or compute the new labels in an online manner. This might require more bits than the information-theoretic minimum of the original graph class, hence, defeating the purpose of succinctness. Given this, the data structure designer must allow the user to access the data structure with the original labels, i.e., relabeling is not allowed. We call such a graph data structure ``accessible''. In this paper, we study the complexity of such accessible data structures for interval graphs, a graph class with information-theoretic minimum less than $n\log n$ bits. - We formalize the concept of "accessibility" (which was implicitly assumed), and propose the "universal interval representation", for interval graphs. - Any data structure for interval graphs in universal interval representation, which supports both adjacency and degree query simultaneously with time cost $t_1$ and $t_2$ respectively, must consume at least $\log_2(n!)+n/(\log n)^{O(t_1+t_2)}$ bits of space. This is also the first lower bound for graph classes with information-theoretic minimum less than $n\log_2n$ bits. - We provide efficient succinct data structures for interval graphs in universal interval representation supporting adjacency query and degree query individually in constant time and space costs. Therefore, two upper bounds together with the lower bound show that the two elementary queries for interval graphs are incompatible with each other in the context of succinct data structure. To the best of our knowledge, this is the first proof of such incompatibility phenomenon.
Generate Complete Logging Statements with an Efficient End-to-End Approach
Authors: Xiaoyuan Xie, Zhipeng Cai, Songqiang Chen, Jifeng Xuan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.02862
Pdf link: https://arxiv.org/pdf/2311.02862
Abstract Logs are significant in modern software systems, aiding in various maintenance tasks. To make better use of these logs, many methods have been proposed to help developers draft suitable logging statements. However, these methods only help developers either locate logging positions or write partial content of logging statements, or cannot efficiently help in generating and inserting complete logging statements. To address their limitations, we introduce a new method to better support the automated end-to-end generation of logging statements. Our end-to-end method consists of two steps, first utilizing token classification to locate where to insert a logging statement, and then employing a Seq2Seq model to generate a complete logging statement with a log level and a log message for that position. We evaluate our proposed method on the previously used benchmark and a self-constructed new benchmark. The experimental results show that our method outperforms the state-of-the-art approach a lot regarding both generation speed and quality.
Energy-Efficient Multidimensional Constellation Based on Leech Lattice for Visible Light Communications
Authors: Jia-Ning Guo, Ru-Han Chen, Jian Zhang, Longguang Li, Jing Zhou
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.02865
Pdf link: https://arxiv.org/pdf/2311.02865
Abstract In this paper, a 24-dimensional geometrically-shaped constellation design based on Leech lattice is presented for indoor visible light communications (VLCs) with a peak-and an average-intensity input constraints. Firstly, by leveraging tools from large deviation theory, we characterize second-order asymptotics of the optimal constellation shaping region under aforementioned intensity constraints, which further refine our previous results in [Chen. et. al, 2020]. Within the optimal geometrical shaping region, we develop an energy-efficient 24-dimensional constellation design, where a significant coding gain brought by the Leech lattice and the nearly-maximum shaping gain are incorporated by using a strategy called coarsely shaping and finely coding. Fast algorithms for constellation mapping and demodulation are presented as well. Numerical results verifies the superiority of our results as compared with existing methods.
Lightweight equivariant interaction graph neural network for accurate and efficient interatomic potential and force predictions
Authors: Ziduo Yang, Xian Wang, Tiejun Dong, Yifan Li, Qiujie Lv, Calvin Yu-Chian Chen, Lei Shen
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2311.02869
Pdf link: https://arxiv.org/pdf/2311.02869
Abstract In modern computational materials science, deep learning has shown the capability to predict interatomic potentials, thereby supporting and accelerating conventional simulations. However, existing models typically sacrifice either accuracy or efficiency. Moreover, lightweight models are highly demanded for offering simulating systems on a considerably larger scale at reduced computational costs. A century ago, Felix Bloch demonstrated how leveraging the equivariance of the translation operation on a crystal lattice (with geometric symmetry) could significantly reduce the computational cost of determining wavefunctions and accurately calculate material properties. Here, we introduce a lightweight equivariant interaction graph neural network (LEIGNN) that can enable accurate and efficient interatomic potential and force predictions in crystals. Rather than relying on higher-order representations, LEIGNN employs a scalar-vector dual representation to encode equivariant features. By extracting both local and global structures from vector representations and learning geometric symmetry information, our model remains lightweight while ensuring prediction accuracy and robustness through the equivariance. Our results show that LEIGNN consistently outperforms the prediction performance of the representative baselines and achieves significant efficiency across diverse datasets, which include catalysts, molecules, and organic isomers. Finally, to further validate the predicted interatomic potentials from our model, we conduct classical molecular dynamics (MD) and ab initio MD simulation across various systems, including solid, liquid, and gas. It is found that LEIGNN can achieve the accuracy of ab initio MD and retain the computational efficiency of classical MD across all examined systems, demonstrating its accuracy, efficiency, and universality.
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Authors: Shiyang Lu, Haonan Chang, Eric Pu Jing, Abdeslam Boularias, Kostas Bekris
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02873
Pdf link: https://arxiv.org/pdf/2311.02873
Abstract This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D object instance retrieval without using any 3D data for training. Given a language query, the proposed method is able to return a ranked set of 3D object instance segments based on the feature similarity of the instance and the text query. This is achieved by a multi-view fusion of text-aligned 2D region proposals into 3D space, where the 2D region proposal network could leverage 2D datasets, which are more accessible and typically larger than 3D datasets. The proposed fusion process is efficient as it can be performed in real-time for most indoor 3D scenes and does not require additional training in 3D space. Experiments on public datasets and a real robot show the effectiveness of the method and its potential for applications in robot navigation and manipulation.
Virtual Action Actor-Critic Framework for Exploration (Student Abstract)
Authors: Bumgeun Park, Taeyoung Kim, Quoc-Vinh Lai-Dang, Dongsoo Har
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02916
Pdf link: https://arxiv.org/pdf/2311.02916
Abstract Efficient exploration for an agent is challenging in reinforcement learning (RL). In this paper, a novel actor-critic framework namely virtual action actor-critic (VAAC), is proposed to address the challenge of efficient exploration in RL. This work is inspired by humans' ability to imagine the potential outcomes of their actions without actually taking them. In order to emulate this ability, VAAC introduces a new actor called virtual actor (VA), alongside the conventional actor-critic framework. Unlike the conventional actor, the VA takes the virtual action to anticipate the next state without interacting with the environment. With the virtual policy following a Gaussian distribution, the VA is trained to maximize the anticipated novelty of the subsequent state resulting from a virtual action. If any next state resulting from available actions does not exhibit high anticipated novelty, training the VA leads to an increase in the virtual policy entropy. Hence, high virtual policy entropy represents that there is no room for exploration. The proposed VAAC aims to maximize a modified Q function, which combines cumulative rewards and the negative sum of virtual policy entropy. Experimental results show that the VAAC improves the exploration performance compared to existing algorithms.
Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things
Authors: Li Ping Qian, Yi Zhang, Sikai Lyu, Huijie Zhu, Yuan Wu, Xuemin Sherman Shen, Xiaoniu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02926
Pdf link: https://arxiv.org/pdf/2311.02926
Abstract With the rapid development of Artificial Intelligent Internet of Things (AIoT), the image data from AIoT devices has been witnessing the explosive increasing. In this paper, a novel deep image semantic communication model is proposed for the efficient image communication in AIoT. Particularly, at the transmitter side, a high-precision image semantic segmentation algorithm is proposed to extract the semantic information of the image to achieve significant compression of the image data. At the receiver side, a semantic image restoration algorithm based on Generative Adversarial Network (GAN) is proposed to convert the semantic image to a real scene image with detailed information. Simulation results demonstrate that the proposed image semantic communication model can improve the image compression ratio and recovery accuracy by 71.93% and 25.07% on average in comparison with WebP and CycleGAN, respectively. More importantly, our demo experiment shows that the proposed model reduces the total delay by 95.26% in the image communication, when comparing with the original image transmission.
Design, implementation, and validation of a benchmark generator for combinatorial interaction testing tools
Authors: Andrea Bombarda, Angelo Gargantini
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.03016
Pdf link: https://arxiv.org/pdf/2311.03016
Abstract Combinatorial testing is a widely adopted technique for efficiently detecting faults in software. The quality of combinatorial test generators plays a crucial role in achieving effective test coverage. Evaluating combinatorial test generators remains a challenging task that requires diverse and representative benchmarks. Having such benchmarks might help developers to test their tools, and improve their performance. For this reason, in this paper, we present BenCIGen, a highly configurable generator of benchmarks to be used by combinatorial test generators, empowering users to customize the type of benchmarks generated, including constraints and parameters, as well as their complexity. An initial version of such a tool has been used during the CT-Competition, held yearly during the International Workshop on Combinatorial Testing. This paper describes the requirements, the design, the implementation, and the validation of BenCIGen. Tests for the validation of BenCIGen are derived from its requirements by using a combinatorial interaction approach. Moreover, we demonstrate the tool's ability to generate benchmarks that reflect the characteristics of real software systems. BenCIGen not only facilitates the evaluation of existing generators but also serves as a valuable resource for researchers and practitioners seeking to enhance the quality and effectiveness of combinatorial testing methodologies.
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Authors: Xuwei Xu, Sen Wang, Yudong Chen, Yanping Zheng, Zhewei Wei, Jiajun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03035
Pdf link: https://arxiv.org/pdf/2311.03035
Abstract Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.
Maximal Consistent Subsystems of Max-T Fuzzy Relational Equations
Authors: Ismaïl Baaj
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2311.03059
Pdf link: https://arxiv.org/pdf/2311.03059
Abstract In this article, we study the inconsistency of a system of $\max-T$ fuzzy relational equations of the form $A \Box{T}^{\max} x = b$, where $T$ is a t-norm among $\min$, the product or Lukasiewicz's t-norm. For an inconsistent $\max-T$ system, we directly construct a canonical maximal consistent subsystem (w.r.t the inclusion order). The main tool used to obtain it is the analytical formula which compute the Chebyshev distance $\Delta = \inf{c \in \mathcal{C}} \Vert b - c \Vert$ associated to the inconsistent $\max-T$ system, where $\mathcal{C}$ is the set of second members of consistent systems defined with the same matrix $A$. Based on the same analytical formula, we give, for an inconsistent $\max-\min$ system, an efficient method to obtain all its consistent subsystems, and we show how to iteratively get all its maximal consistent subsystems.
SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet
Authors: Maurice Günder, Facundo Ramón Ispizua Yamati, Abel Andree Barreta Alcántara, Anne-Katrin Mahlein, Rafet Sifa, Christian Bauckhage
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03076
Pdf link: https://arxiv.org/pdf/2311.03076
Abstract Remote sensing and artificial intelligence are pivotal technologies of precision agriculture nowadays. The efficient retrieval of large-scale field imagery combined with machine learning techniques shows success in various tasks like phenotyping, weeding, cropping, and disease control. This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation for the use case disease severity scoring for Cercospora Leaf Spot (CLS) in sugar beet. With concepts of Deep Label Distribution Learning (DLDL), special loss functions, and a tailored model architecture, we develop an efficient Vision Transformer based model for disease severity scoring called SugarViT. One novelty in this work is the combination of remote sensing data with environmental parameters of the experimental sites for disease severity prediction. Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks. With our framework, it is even possible to learn models on multi-objective problems as we show by a pretraining on environmental metadata.
A Simple yet Efficient Ensemble Approach for AI-generated Text Detection
Authors: Harika Abburi, Kalyani Roy, Michael Suesserman, Nirmala Pudota, Balaji Veeramani, Edward Bowen, Sanmitra Bhattacharya
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03084
Pdf link: https://arxiv.org/pdf/2311.03084
Abstract Recent Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across wide range of styles and genres. However, such capabilities are prone to potential abuse, such as fake news generation, spam email creation, and misuse in academic assignments. Hence, it is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text. In this paper, we propose a simple yet efficient solution to this problem by ensembling predictions from multiple constituent LLMs. Compared to previous state-of-the-art approaches, which are perplexity-based or uses ensembles with a number of LLMs, our condensed ensembling approach uses only two constituent LLMs to achieve comparable performance. Experiments conducted on four benchmark datasets for generative text classification show performance improvements in the range of 0.5 to 100\% compared to previous state-of-the-art approaches. We also study the influence the training data from individual LLMs have on model performance. We found that substituting commercially-restrictive Generative Pre-trained Transformer (GPT) data with data generated from other open language models such as Falcon, Large Language Model Meta AI (LLaMA2), and Mosaic Pretrained Transformers (MPT) is a feasible alternative when developing generative text detectors. Furthermore, to demonstrate zero-shot generalization, we experimented with an English essays dataset, and results suggest that our ensembling approach can handle new data effectively.
Pelvic floor MRI segmentation based on semi-supervised deep learning
Authors: Jianwei Zuo, Fei Feng, Zhuhui Wang, James A. Ashton-Miller, John O.L. Delancey, Jiajia Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03105
Pdf link: https://arxiv.org/pdf/2311.03105
Abstract The semantic segmentation of pelvic organs via MRI has important clinical significance. Recently, deep learning-enabled semantic segmentation has facilitated the three-dimensional geometric reconstruction of pelvic floor organs, providing clinicians with accurate and intuitive diagnostic results. However, the task of labeling pelvic floor MRI segmentation, typically performed by clinicians, is labor-intensive and costly, leading to a scarcity of labels. Insufficient segmentation labels limit the precise segmentation and reconstruction of pelvic floor organs. To address these issues, we propose a semi-supervised framework for pelvic organ segmentation. The implementation of this framework comprises two stages. In the first stage, it performs self-supervised pre-training using image restoration tasks. Subsequently, fine-tuning of the self-supervised model is performed, using labeled data to train the segmentation model. In the second stage, the self-supervised segmentation model is used to generate pseudo labels for unlabeled data. Ultimately, both labeled and unlabeled data are utilized in semi-supervised training. Upon evaluation, our method significantly enhances the performance in the semantic segmentation and geometric reconstruction of pelvic organs, Dice coefficient can increase by 2.65% averagely. Especially for organs that are difficult to segment, such as the uterus, the accuracy of semantic segmentation can be improved by up to 3.70%.
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding
Authors: Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03106
Pdf link: https://arxiv.org/pdf/2311.03106
Abstract Unsupervised pre-training has shown great success in skeleton-based action understanding recently. Existing works typically train separate modality-specific models, then integrate the multi-modal information for action understanding by a late-fusion strategy. Although these approaches have achieved significant performance, they suffer from the complex yet redundant multi-stream model designs, each of which is also limited to the fixed input skeleton modality. To alleviate these issues, in this paper, we propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL, which exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner. Specifically, instead of designing separate modality-specific optimization processes for uni-modal unsupervised learning, we feed different modality inputs into the same stream with an early-fusion strategy to learn their multi-modal features for reducing model complexity. To ensure that the fused multi-modal features do not exhibit modality bias, i.e., being dominated by a certain modality input, we further propose both intra- and inter-modal consistency learning to guarantee that the multi-modal features contain the complete semantics of each modal via feature decomposition and distinct alignment. In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases. Extensive experiments conducted on three large-scale datasets, i.e., NTU-60, NTU-120, and PKU-MMD II, demonstrate that UmURL is highly efficient, possessing the approximate complexity with the uni-modal methods, while achieving new state-of-the-art performance across various downstream task scenarios in skeleton-based action representation learning.
Animating NeRFs from Texture Space: A Framework for Pose-Dependent Rendering of Human Performances
Authors: Paul Knoll, Wieland Morgenstern, Anna Hilsmann, Peter Eisert
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03140
Pdf link: https://arxiv.org/pdf/2311.03140
Abstract Creating high-quality controllable 3D human models from multi-view RGB videos poses a significant challenge. Neural radiance fields (NeRFs) have demonstrated remarkable quality in reconstructing and free-viewpoint rendering of static as well as dynamic scenes. The extension to a controllable synthesis of dynamic human performances poses an exciting research question. In this paper, we introduce a novel NeRF-based framework for pose-dependent rendering of human performances. In our approach, the radiance field is warped around an SMPL body mesh, thereby creating a new surface-aligned representation. Our representation can be animated through skeletal joint parameters that are provided to the NeRF in addition to the viewpoint for pose dependent appearances. To achieve this, our representation includes the corresponding 2D UV coordinates on the mesh texture map and the distance between the query point and the mesh. To enable efficient learning despite mapping ambiguities and random visual variations, we introduce a novel remapping process that refines the mapped coordinates. Experiments demonstrate that our approach results in high-quality renderings for novel-view and novel-pose synthesis.
Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction
Authors: Silvia Romero-Azpitarte, Cristina Luna, Alba Guerra, Mercedes Alonso, Pablo Romeo Manrique, Marina L. Seoane, Daniel Olayo, Almudena Moreno, Pablo Castellanos, Fernando Gandía, Gianfranco Visentin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03146
Pdf link: https://arxiv.org/pdf/2311.03146
Abstract Space exploration and establishing human presence on other planets demand advanced technology and effective collaboration between robots and astronauts. Efficient space resource utilization is also vital for extraterrestrial settlements. The Collaborative In-Situ Resources Utilisation (CISRU) project has developed a software suite comprising five key modules. The first module manages multi-agent autonomy, facilitating communication between agents and mission control. The second focuses on environment perception, employing AI algorithms for tasks like environment segmentation and object pose estimation. The third module ensures safe navigation, covering obstacle avoidance, social navigation with astronauts, and cooperation among robots. The fourth module addresses manipulation functions, including multi-tool capabilities and tool-changer design for diverse tasks in In-Situ Resources Utilization (ISRU) scenarios. Finally, the fifth module controls cooperative behaviour, incorporating astronaut commands, Mixed Reality interfaces, map fusion, task supervision, and error control. The suite was tested using an astronaut-rover interaction dataset in a planetary environment and GMV SPoT analogue environments. Results demonstrate the advantages of E4 autonomy and AI in space systems, benefiting astronaut-robot collaboration. This paper details CISRU's development, field test preparation, and analysis, highlighting its potential to revolutionize planetary exploration through AI-powered technology.
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Authors: Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03149
Pdf link: https://arxiv.org/pdf/2311.03149
Abstract Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the performance of these foundation models. However, these large foundation models often result in high computational cost that might limit their deployment. This paper focuses on pre-training relatively small vision transformer models that could be efficiently adapted to downstream tasks. Specifically, taking inspiration from knowledge distillation in model compression, we propose a new asymmetric masked distillation(AMD) framework for pre-training relatively small models with autoencoding. The core of AMD is to devise an asymmetric masking strategy, where the teacher model is enabled to see more context information with a lower masking ratio, while the student model still with high masking ratio to the original masked pre-training. We design customized multi-layer feature alignment between the teacher encoder and student encoder to regularize the pre-training of student MAE. To demonstrate the effectiveness and versatility of AMD, we apply it to both ImageMAE and VideoMAE for pre-training relatively small ViT models. AMD achieved 84.6% classification accuracy on IN1K using the ViT-B model. And AMD achieves 73.3% classification accuracy using the ViT-B model on the Something-in-Something V2 dataset, a 3.7% improvement over the original ViT-B model from VideoMAE. We also transfer AMD pre-trained models to downstream tasks and obtain consistent performance improvement over the standard pre-training.
Preserving Privacy in GANs Against Membership Inference Attack
Authors: Mohammadhadi Shateri, Francisco Messina, Fabrice Labeau, Pablo Piantanida
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.03172
Pdf link: https://arxiv.org/pdf/2311.03172
Abstract Generative Adversarial Networks (GANs) have been widely used for generating synthetic data for cases where there is a limited size real-world dataset or when data holders are unwilling to share their data samples. Recent works showed that GANs, due to overfitting and memorization, might leak information regarding their training data samples. This makes GANs vulnerable to Membership Inference Attacks (MIAs). Several defense strategies have been proposed in the literature to mitigate this privacy issue. Unfortunately, defense strategies based on differential privacy are proven to reduce extensively the quality of the synthetic data points. On the other hand, more recent frameworks such as PrivGAN and PAR-GAN are not suitable for small-size training datasets. In the present work, the overfitting in GANs is studied in terms of the discriminator, and a more general measure of overfitting based on the Bhattacharyya coefficient is defined. Then, inspired by Fano's inequality, our first defense mechanism against MIAs is proposed. This framework, which requires only a simple modification in the loss function of GANs, is referred to as the maximum entropy GAN or MEGAN and significantly improves the robustness of GANs to MIAs. As a second defense strategy, a more heuristic model based on minimizing the information leaked from generated samples about the training data points is presented. This approach is referred to as mutual information minimization GAN (MIMGAN) and uses a variational representation of the mutual information to minimize the information that a synthetic sample might leak about the whole training data set. Applying the proposed frameworks to some commonly used data sets against state-of-the-art MIAs reveals that the proposed methods can reduce the accuracy of the adversaries to the level of random guessing accuracy with a small reduction in the quality of the synthetic data samples.
1D-Convolutional transformer for Parkinson disease diagnosis from gait
Authors: Safwen Naimi, Wassim Bouachir, Guillaume-Alexandre Bilodeau
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03177
Pdf link: https://arxiv.org/pdf/2311.03177
Abstract This paper presents an efficient deep neural network model for diagnosing Parkinson's disease from gait. More specifically, we introduce a hybrid ConvNet-Transformer architecture to accurately diagnose the disease by detecting the severity stage. The proposed architecture exploits the strengths of both Convolutional Neural Networks and Transformers in a single end-to-end model, where the former is able to extract relevant local features from Vertical Ground Reaction Force (VGRF) signal, while the latter allows to capture long-term spatio-temporal dependencies in data. In this manner, our hybrid architecture achieves an improved performance compared to using either models individually. Our experimental results show that our approach is effective for detecting the different stages of Parkinson's disease from gait data, with a final accuracy of 88%, outperforming other state-of-the-art AI methods on the Physionet gait dataset. Moreover, our method can be generalized and adapted for other classification problems to jointly address the feature relevance and spatio-temporal dependency problems in 1D signals. Our source code and pre-trained models are publicly available at https://github.com/SafwenNaimi/1D-Convolutional-transformer-for-Parkinson-disease-diagnosis-from-gait.
Quantum Task Offloading with the OpenMP API
Authors: Joseph K. L. Lee, Oliver T. Brown, Mark Bull, Martin Ruefenacht, Johannes Doerfert, Michael Klemm, Martin Schulz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.03210
Pdf link: https://arxiv.org/pdf/2311.03210
Abstract Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface for HPC applications to utilize quantum compute resources. We have implemented a variational quantum eigensolver using the programming model, which has been tested using a classical simulator. We are in the process of testing on the quantum resources hosted at the Leibniz Supercomputing Centre (LRZ).
Assessing the Maturity of Model Maintenance Techniques for AIOps Solutions
Authors: Yingzhe Lyu, Heng Li, Zhen Ming (Jack)Jiang, Ahmed E. Hassan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.03213
Pdf link: https://arxiv.org/pdf/2311.03213
Abstract AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operations of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in the field are subject to constant evolution from factors like the changing operational environment and user base, the models in AIOps solutions need to be constantly maintained after deployment. While prior works focus on innovative modeling techniques to improve the performance of AIOps models before releasing them into the field, when and how to maintain AIOps models remain an under-investigated topic. In this work, we performed a case study on three large-scale public operation data to assess different model maintenance approaches regarding their performance, maintenance cost, and stability. We observed that active model maintenance approaches achieve better and more stable performance than a stationary approach. Particularly, applying sophisticated model maintenance approaches (e.g., concept drift detection, time-based ensembles, or online learning approaches) could provide better performance, efficiency, and stability than simply retraining AIOps models periodically. In addition, we observed that, although some maintenance approaches (e.g., time-based ensemble and online learning) can save model training time, they significantly sacrifice model testing time, which could hinder their applications in AIOps solutions where the operation data arrive at high speed and volume and where instant predictions are required. Our findings highlight that practitioners should consider the evolution of operation data and actively maintain AIOps models over time. Our observations can also guide researchers and practitioners to investigate more efficient and effective model maintenance techniques that fit in the context of AIOps.
Segmentation of Drone Collision Hazards in Airborne RADAR Point Clouds Using PointNet
Authors: Hector Arroyo, Paul Kier, Dylan Angus, Santiago Matalonga, Svetlozar Georgiev, Mehdi Goli, Gerard Dooly, James Riordan
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03221
Pdf link: https://arxiv.org/pdf/2311.03221
Abstract The integration of unmanned aerial vehicles (UAVs) into shared airspace for beyond visual line of sight (BVLOS) operations presents significant challenges but holds transformative potential for sectors like transportation, construction, energy and defense. A critical prerequisite for this integration is equipping UAVs with enhanced situational awareness to ensure safe operations. Current approaches mainly target single object detection or classification, or simpler sensing outputs that offer limited perceptual understanding and lack the rapid end-to-end processing needed to convert sensor data into safety-critical insights. In contrast, our study leverages radar technology for novel end-to-end semantic segmentation of aerial point clouds to simultaneously identify multiple collision hazards. By adapting and optimizing the PointNet architecture and integrating aerial domain insights, our framework distinguishes five distinct classes: mobile drones (DJI M300 and DJI Mini) and airplanes (Ikarus C42), and static returns (ground and infrastructure) which results in enhanced situational awareness for UAVs. To our knowledge, this is the first approach addressing simultaneous identification of multiple collision threats in an aerial setting, achieving a robust 94% accuracy. This work highlights the potential of radar technology to advance situational awareness in UAVs, facilitating safe and efficient BVLOS operations.
Balancing Notions of Equity: Approximation Algorithms for Fair Portfolio of Solutions in Combinatorial Optimization
Authors: Swati Gupta, Jai Moondra, Mohit Singh
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03230
Pdf link: https://arxiv.org/pdf/2311.03230
Abstract Inspired by equity considerations, we consider top-$k$ norm, ordered norm, and symmetric monotonic norm objectives for various combinatorial optimization problems. Top-$k$ norms and ordered norms have natural interpretations in terms of minimizing the impact on individuals bearing largest costs. To model decision-making with multiple equity criteria, we study the notion of portfolios of solutions with the property that each norm or equity criteria has an approximately optimal solution in this portfolio. We attempt to characterize portfolios by their sizes and approximation factor guarantees for various combinatorial problems. For a given problem, we investigate whether (1) there exists a single solution that is approximately optimal for all norms, (2) there exists a small approximately optimal portfolio of size larger than 1, (3) there exist polynomial time algorithms to find these small portfolios. We study an algorithmic framework to obtain single solutions that are approximately optimal for all norms. We show the existence of such a solution for problems such as $k$-clustering, ordered set cover, scheduling for job completion time minimization, and scheduling for machine load minimization on identical machines. We also give efficient algorithms to find these solutions in most cases, except set cover where we show there is a gap in terms of computational complexity. Our work improves upon the best-known approximation factor across all norms for a single solution in $k$-clustering. For uncapacitated facility location and scheduling for machine load minimization with identical jobs, we obtain logarithmic sized portfolios, also providing a matching lower bound in the latter case. Our work results in new open combinatorial questions, which might be of independent interest.
On Finding Optimal (Dynamic) Arborescences
Authors: Joaquim Espada, Alexandre P. Francisco, Tatiana Rocher, Luís M. S. Russo, Cátia Vaz
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03262
Pdf link: https://arxiv.org/pdf/2311.03262
Abstract Let G = (V, E) be a directed and weighted graph with vertex set V of size n and edge set E of size m, such that each edge (u, v) \in E has a real-valued weight w(u, c). An arborescence in G is a subgraph T = (V, E') such that for a vertex u \in V, the root, there is a unique path in T from u to any other vertex v \in V. The weight of T is the sum of the weights of its edges. In this paper, given G, we are interested in finding an arborescence in G with minimum weight, i.e., an optimal arborescence. Furthermore, when G is subject to changes, namely edge insertions and deletions, we are interested in efficiently maintaining a dynamic arborescence in G. This is a well known problem with applications in several domains such as network design optimization and in phylogenetic inference. In this paper we revisit algorithmic ideas proposed by several authors for this problem, we provide detailed pseudo-code as well as implementation details, and we present experimental results on large scale-free networks and on phylogenetic inference. Our implementation is publicly available at \url{https://gitlab.com/espadas/optimal-arborescences}.
Exploiting Latent Attribute Interaction with Transformer on Heterogeneous Information Networks
Authors: Zeyuan Zhao, Qingqing Ge, Anfeng Cheng, Yiding Liu, Xiang Li, Shuaiqiang Wang
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2311.03275
Pdf link: https://arxiv.org/pdf/2311.03275
Abstract Heterogeneous graph neural networks (HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Due to the diversity of attributes of nodes in different types, most existing models first align nodes by mapping them into the same low-dimensional space. However, in this way, they lose the type information of nodes. In addition, most of them only consider the interactions between nodes while neglecting the high-order information behind the latent interactions among different node features. To address these problems, in this paper, we propose a novel heterogeneous graph model MULAN, including two major components, i.e., a type-aware encoder and a dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and better leverages graph heterogeneity in learning node representations. Built upon transformer architecture, the dimension-aware encoder is capable of capturing the latent interactions among the diverse node features. With these components, the information of graph heterogeneity, node features and graph structure can be comprehensively encoded in node representations. We conduct extensive experiments on six heterogeneous benchmark datasets, which demonstrates the superiority of MULAN over other state-of-the-art competitors and also shows that MULAN is efficient.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.03285
Pdf link: https://arxiv.org/pdf/2311.03285
Abstract The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services.
Exact Shortest Paths with Rational Weights on the Word RAM
Authors: Adam Karczmarz, Wojciech Nadara, Marek Sokołowski
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03321
Pdf link: https://arxiv.org/pdf/2311.03321
Abstract Exact computation of shortest paths in weighted graphs has been traditionally studied in one of two settings. First, one can assume that the edge weights are real numbers and all the performed operations on reals (typically comparisons and additions) take constant time. Classical Dijkstra's and Bellman-Ford algorithms have been described in this setting. More efficient exact shortest paths algorithms have been obtained for integer-weighted graphs. Integrality assumption not only enables faster algorithms but also allows implementing the aforementioned algorithms in a much more realistic word RAM model where only arithmetic operations on $O(\log{n})$-bit integers are performed in constant time. On the word RAM one can as efficiently exactly encode even \emph{rational-weighted} instances with $O(\log{n})$-bit numerators and denominators. However, the known exact real-weighted shortest paths algorithms, run on such a rational input, can easily encounter intermediate values of $\Theta(n)$ bits if represented exactly. This leads to a factor-$\Omega(n)$ slowdown on the word RAM. At the same time, the scaling algorithms suited for integer weights do not produce exact solutions for rational inputs without dramatically increasing their accuracy. In this paper, we design randomized exact single-source shortest paths algorithms for rational-weighted graphs on the word RAM. Most importantly, in the non-negative case, we obtain a near-linear time algorithm matching Dijkstra's algorithm running time up to polylogarithmic factors. In presence of negative weights, we give an $\tilde{O}(n^{2.5})$-time algorithm breaking through the best known strongly polynomial bound attained by Bellman-Ford for sufficiently dense graphs.
Decomposing Probability Marginals Beyond Affine Requirements
Authors: Jannik Matuschke
Subjects: Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2311.03346
Pdf link: https://arxiv.org/pdf/2311.03346
Abstract Consider the triplet $(E, \mathcal{P}, \pi)$, where $E$ is a finite ground set, $\mathcal{P} \subseteq 2^E$ is a collection of subsets of $E$ and $\pi : \mathcal{P} \rightarrow [0,1]$ is a requirement function. Given a vector of marginals $\rho \in [0, 1]^E$, our goal is to find a distribution for a random subset $S \subseteq E$ such that $\operatorname{Pr}[e \in S] = \rho_e$ for all $e \in E$ and $\operatorname{Pr}[P \cap S \neq \emptyset] \geq \piP$ for all $P \in \mathcal{P}$, or to determine that no such distribution exists. Generalizing results of Dahan, Amin, and Jaillet, we devise a generic decomposition algorithm that solves the above problem when provided with a suitable sequence of admissible support candidates (ASCs). We show how to construct such ASCs for numerous settings, including supermodular requirements, Hoffman-Schwartz-type lattice polyhedra, and abstract networks where $\pi$ fulfils a conservation law. The resulting algorithm can be carried out efficiently when $\mathcal{P}$ and $\pi$ can be accessed via appropriate oracles. For any system allowing the construction of ASCs, our results imply a simple polyhedral description of the set of marginal vectors for which the decomposition problem is feasible. Finally, we characterize balanced hypergraphs as the systems $(E, \mathcal{P})$ that allow the perfect decomposition of any marginal vector $\rho \in [0,1]^E$, i.e., where we can always find a distribution reaching the highest attainable probability $\operatorname{Pr}[P \cap S \neq \emptyset] = \min { \sum{e \in P} \rho_e, 1}$ for all $P \in \mathcal{P}$.
Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
Authors: Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, Huazhe Xu
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03351
Pdf link: https://arxiv.org/pdf/2311.03351
Abstract Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-o4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-o4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-o4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Our website: https://lei-kun.github.io/uni-o4/ .
Exploitation-Guided Exploration for Semantic Embodied Navigation
Authors: Justin Wasserman, Girish Chowdhary, Abhinav Gupta, Unnat Jain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.03357
Pdf link: https://arxiv.org/pdf/2311.03357
Abstract In the recent progress in embodied navigation and sim-to-robot transfer, modular policies have emerged as a de facto framework. However, there is more to compositionality beyond the decomposition of the learning load into modular components. In this work, we investigate a principled way to syntactically combine these components. Particularly, we propose Exploitation-Guided Exploration (XGX) where separate modules for exploration and exploitation come together in a novel and intuitive manner. We configure the exploitation module to take over in the deterministic final steps of navigation i.e. when the goal becomes visible. Crucially, an exploitation module teacher-forces the exploration module and continues driving an overridden policy optimization. XGX, with effective decomposition and novel guidance, improves the state-of-the-art performance on the challenging object navigation task from 70% to 73%. Along with better accuracy, through targeted analysis, we show that XGX is also more efficient at goal-conditioned exploration. Finally, we show sim-to-real transfer to robot hardware and XGX performs over two-fold better than the best baseline from simulation benchmarking. Project page: xgxvisnav.github.io
Keyword: faster

Using General Value Functions to Learn Domain-Backed Inventory Management Policies
Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.02125
Pdf link: https://arxiv.org/pdf/2311.02125
Abstract We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision making problem, which is challenging due to uncertain demand and the existence of aggregate (cross-product) constraints. In existing literature, GVFs have primarily been used for auxiliary task learning. We use this capability to train GVFs on domain-critical characteristics such as prediction of stock-out probability and wastage quantity. Using this domain expertise for more effective exploration, we train an RL agent to compute the inventory replenishment quantities for a large range of products (up to 6000 in the reported experiments), which share aggregate constraints such as the total weight/volume per delivery. Additionally, we show that the GVF predictions can be used to provide additional domain-backed insights into the decisions proposed by the RL agent. Finally, since the environment dynamics are fully transferred, the trained GVFs can be used for faster adaptation to vastly different business objectives (for example, due to the start of a promotional period or due to deployment in a new customer environment).
Resource savings from fault-tolerant circuit design
Authors: Andrew K. Tan, Isaac L. Chuang
Subjects: Computational Engineering, Finance, and Science (cs.CE); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.02132
Pdf link: https://arxiv.org/pdf/2311.02132
Abstract Using fault-tolerant constructions, computations performed with unreliable components can simulate their noiseless counterparts though the introduction of a modest amount of redundancy. Given the modest overhead required to achieve fault-tolerance, and the fact that increasing the reliability of basic components often comes at a cost, are there situations where fault-tolerance may be more economical? We present a general framework to account for this overhead cost in order to effectively compare fault-tolerant to non-fault-tolerant approaches for computation, in the limit of small logical error rates. Using this detailed accounting, we determine explicit boundaries at which fault-tolerant designs become more efficient than designs that achieve comparable reliability through direct consumption of resources. We find that the fault-tolerant construction is always preferred in the limit of high reliability in cases where the resources required to construct a basic unit grows faster than $\log(1 / \epsilon)$ asymptotically for small $\epsilon$.
State-wise Safe Reinforcement Learning With Pixel Observations
Authors: Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.02227
Pdf link: https://arxiv.org/pdf/2311.02227
Abstract Reinforcement Learning(RL) in the context of safe exploration has long grappled with the challenges of the delicate balance between maximizing rewards and minimizing safety violations, the complexities arising from contact-rich or non-smooth environments, and high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the exploration and learning process, where the agent is prohibited from accessing unsafe regions without prior knowledge, adds an additional layer of complexity. In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through the introduction of a latent barrier function learning mechanism. As a joint learning framework, our approach first involves constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations. Subsequently, we build and learn a latent barrier function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return. Experimental evaluations on the safety-gym benchmark suite demonstrate that our proposed method significantly reduces safety violations throughout the training process and demonstrates faster safety convergence compared to existing methods while achieving competitive results in reward return.
Predicting Ground Reaction Force from Inertial Sensors
Authors: Bowen Song, Marco Paolieri, Harper E. Stewart, Leana Golubchik, Jill L. McNitt-Gray, Vishal Misra, Devavrat Shah
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02287
Pdf link: https://arxiv.org/pdf/2311.02287
Abstract The study of ground reaction forces (GRF) is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries. Our aim in this paper is to determine if data collected with inertial measurement units (IMUs), that can be worn by athletes during outdoor runs, can be used to predict GRF with sufficient accuracy to allow the analysis of its derived biomechanical variables (e.g., contact time and loading rate). In this paper, we consider lightweight approaches in contrast to state-of-the-art prediction using LSTM neural networks. Specifically, we compare use of LSTMs to k-Nearest Neighbors (KNN) regression as well as propose a novel solution, SVD Embedding Regression (SER), using linear regression between singular value decomposition embeddings of IMUs data (input) and GRF data (output). We evaluate the accuracy of these techniques when using training data collected from different athletes, from the same athlete, or both, and we explore the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Our results illustrate that simple machine learning methods such as SER and KNN can be similarly accurate or more accurate than LSTM neural networks, with much faster training times and hyperparameter optimization; in particular, SER and KNN are more accurate when personal training data are available, and KNN comes with benefit of providing provenance of prediction. Notably, the use of personal data reduces prediction errors of all methods for most biomechanical variables.
Ultra-Long Sequence Distributed Transformer
Authors: Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02382
Pdf link: https://arxiv.org/pdf/2311.02382
Abstract Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.
Fast Sparse 3D Convolution Network with VDB
Authors: Fangjun Zhou, Anyong Mao, Eftychios Sifakis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02762
Pdf link: https://arxiv.org/pdf/2311.02762
Abstract We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference. This implementation uses NanoVDB as the data structure to store the sparse tensor. It leaves a relatively small memory footprint while maintaining high performance. We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D object classification network.
Distributed Matrix-Based Sampling for Graph Neural Network Training
Authors: Alok Tripathy, Katherine Yelick, Aydin Buluc
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.02909
Pdf link: https://arxiv.org/pdf/2311.02909
Abstract The primary contribution of this paper is new methods for reducing communication in the sampling step for distributed GNN training. Here, we propose a matrix-based bulk sampling approach that expresses sampling as a sparse matrix multiplication (SpGEMM) and samples multiple minibatches at once. When the input graph topology does not fit on a single device, our method distributes the graph and use communication-avoiding SpGEMM algorithms to scale GNN minibatch sampling, enabling GNN training on much larger graphs than those that can fit into a single device memory. When the input graph topology (but not the embeddings) fits in the memory of one GPU, our approach (1) performs sampling without communication, (2) amortizes the overheads of sampling a minibatch, and (3) can represent multiple sampling algorithms by simply using different matrix constructions. In addition to new methods for sampling, we show that judiciously replicating feature data with a simple all-to-all exchange can outperform current methods for the feature extraction step in distributed GNN training. We provide experimental results on the largest Open Graph Benchmark (OGB) datasets on $128$ GPUs, and show that our pipeline is $2.5\times$ faster Quiver (a distributed extension to PyTorch-Geometric) on a $3$-layer GraphSAGE network. On datasets outside of OGB, we show a $8.46\times$ speedup on $128$ GPUs in-per epoch time. Finally, we show scaling when the graph is distributed across GPUs and scaling for both node-wise and layer-wise sampling algorithms
Non Deterministic Pseudorandom Generator for Quantum Key Distribution
Authors: Arun Mishra, Kanaka Raju Pandiri, Anupama Arjun Pandit, Lucy Sharma
Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2311.03024
Pdf link: https://arxiv.org/pdf/2311.03024
Abstract Quantum Key Distribution(QKD) thrives to achieve perfect secrecy of One time Pad (OTP) through quantum processes. One of the crucial components of QKD are Quantum Random Number Generators(QRNG) for generation of keys. Unfortunately, these QRNG does not immediately produce usable bits rather it produces raw bits with high entropy but low uniformity which can be hardly used by any cryptographic system. A lot of pre-processing is required before the random numbers generated by QRNG to be usable. This causes a bottle neck in random number generation rate as well as QKD system relying on it. To avoid this lacuna of post-processing methods employed as a central part of Quantum Random Number Generators alternative approaches that satisfy the entropy(non determinism) and quantum security is explored. Pseudorandom generators based on quantum secure primitives could be an alternative to the post-processing problem as PRNGs are way more faster than any random number generator employing physical randomness (quantum mechanical process in QRNG) as well as it can provide uniform bits required for cryptography application. In this work we propose a pseudorandom generator based on post quantum primitives. The central theme of this random number generator is designing PRNG with non deterministic entropy generated through hard lattice problem - Learning with errors. We leverage the non determinism by Gaussian errors of LWE to construct non-deterministic PRNG satisfying the entropy requirement of QKD. Further, the paper concludes by evaluating the PRNG through Die-Harder Test.
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Authors: Xuwei Xu, Sen Wang, Yudong Chen, Yanping Zheng, Zhewei Wei, Jiajun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03035
Pdf link: https://arxiv.org/pdf/2311.03035
Abstract Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.
Exact Shortest Paths with Rational Weights on the Word RAM
Authors: Adam Karczmarz, Wojciech Nadara, Marek Sokołowski
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03321
Pdf link: https://arxiv.org/pdf/2311.03321
Abstract Exact computation of shortest paths in weighted graphs has been traditionally studied in one of two settings. First, one can assume that the edge weights are real numbers and all the performed operations on reals (typically comparisons and additions) take constant time. Classical Dijkstra's and Bellman-Ford algorithms have been described in this setting. More efficient exact shortest paths algorithms have been obtained for integer-weighted graphs. Integrality assumption not only enables faster algorithms but also allows implementing the aforementioned algorithms in a much more realistic word RAM model where only arithmetic operations on $O(\log{n})$-bit integers are performed in constant time. On the word RAM one can as efficiently exactly encode even \emph{rational-weighted} instances with $O(\log{n})$-bit numerators and denominators. However, the known exact real-weighted shortest paths algorithms, run on such a rational input, can easily encounter intermediate values of $\Theta(n)$ bits if represented exactly. This leads to a factor-$\Omega(n)$ slowdown on the word RAM. At the same time, the scaling algorithms suited for integer weights do not produce exact solutions for rational inputs without dramatically increasing their accuracy. In this paper, we design randomized exact single-source shortest paths algorithms for rational-weighted graphs on the word RAM. Most importantly, in the non-negative case, we obtain a near-linear time algorithm matching Dijkstra's algorithm running time up to polylogarithmic factors. In presence of negative weights, we give an $\tilde{O}(n^{2.5})$-time algorithm breaking through the best known strongly polynomial bound attained by Bellman-Ford for sufficiently dense graphs.
Keyword: mobile

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Authors: Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared G. Roesch, Todd C. Mowry, Tianqi Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2311.02103
Pdf link: https://arxiv.org/pdf/2311.02103
Abstract Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
The Potential of Wearable Sensors for Assessing Patient Acuity in Intensive Care Unit (ICU)
Authors: Jessica Sena, Mohammad Tahsin Mostafiz, Jiaqing Zhang, Andrea Davidson, Sabyasachi Bandyopadhyay, Ren Yuanfang, Tezcan Ozrazgat-Baslanti, Benjamin Shickel, Tyler Loftus, William Robson Schwartz, Azra Bihorac, Parisa Rashidi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.02251
Pdf link: https://arxiv.org/pdf/2311.02251
Abstract Acuity assessments are vital in critical care settings to provide timely interventions and fair resource allocation. Traditional acuity scores rely on manual assessments and documentation of physiological states, which can be time-consuming, intermittent, and difficult to use for healthcare providers. Furthermore, such scores do not incorporate granular information such as patients' mobility level, which can indicate recovery or deterioration in the ICU. We hypothesized that existing acuity scores could be potentially improved by employing Artificial Intelligence (AI) techniques in conjunction with Electronic Health Records (EHR) and wearable sensor data. In this study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for developing an AI-driven acuity assessment score. Accelerometry data were collected from 86 patients wearing accelerometers on their wrists in an academic hospital setting. The data was analyzed using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (SOFA= Sequential Organ Failure Assessment) used as a baseline, particularly regarding the precision, sensitivity, and F1 score. The results showed that while a model relying solely on accelerometer data achieved limited performance (AUC 0.50, Precision 0.61, and F1-score 0.68), including demographic information with the accelerometer data led to a notable enhancement in performance (AUC 0.69, Precision 0.75, and F1-score 0.67). This work shows that the combination of mobility and patient information can successfully differentiate between stable and unstable states in critically ill patients.
QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing
Authors: Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02525
Pdf link: https://arxiv.org/pdf/2311.02525
Abstract In the realm of mobile edge computing (MEC), efficient computation task offloading plays a pivotal role in ensuring a seamless quality of experience (QoE) for users. Maintaining a high QoE is paramount in today's interconnected world, where users demand responsive and reliable services. This challenge stands as one of the most primary key factors contributing to handling dynamic and uncertain mobile environment. In this study, we delve into computation offloading in MEC systems, where strict task processing deadlines and energy constraints can adversely affect the system performance. We formulate the computation task offloading problem as a Markov decision process (MDP) to maximize the long-term QoE of each user individually. We propose a decentralized QoE-oriented computation offloading (QOCO) algorithm based on deep reinforcement learning (DRL) that empowers mobile devices to make their offloading decisions without requiring knowledge of decisions made by other devices. Through numerical studies, we evaluate the performance of QOCO. Simulation results validate that the QOCO algorithm efficiently exploits the computational resources of edge nodes. Consequently, it can complete 14% more tasks and reduce task delay and energy consumption by 9% and 6%, respectively. These together contribute to a significant improvement of at least 37% in average QoE compared to an existing algorithm.
Neural Networks Are Implicit Decision Trees: The Hierarchical Simplicity Bias
Authors: Zhehang Du
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02622
Pdf link: https://arxiv.org/pdf/2311.02622
Abstract Neural networks exhibit simplicity bias; they rely on simpler features while ignoring equally predictive but more complex features. In this work, we introduce a novel approach termed imbalanced label coupling to investigate scenarios where simple and complex features exhibit different levels of predictive power. In these cases, complex features still contribute to predictions. The trained networks make predictions in alignment with the ascending complexity of input features according to how they correlate with the label in the training set, irrespective of the underlying predictive power. For instance, even when simple spurious features distort predictions in CIFAR-10, most cats are predicted to be dogs, and most trucks are predicted to be automobiles! This observation provides direct evidence that the neural network learns core features in the presence of spurious features. We empirically show that last-layer retraining with target data distribution is effective, yet insufficient to fully recover core features when spurious features are perfectly correlated with the target labels in our synthetic dataset. We hope our research contributes to a deeper understanding of the implicit bias of neural networks.
Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments
Authors: Lu Yue, Dongliang Zhou, Liang Xie, Feitian Zhang, Ye Yan, Erwei Yin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02817
Pdf link: https://arxiv.org/pdf/2311.02817
Abstract The task of vision-and-language navigation in continuous environments (VLN-CE) aims at training an autonomous agent to perform low-level actions to navigate through 3D continuous surroundings using visual observations and language instructions. The significant potential of VLN-CE for mobile robots has been demonstrated across a large number of studies. However, most existing works in VLN-CE focus primarily on transferring the standard discrete vision-and-language navigation (VLN) methods to continuous environments, overlooking the problem of collisions. Such oversight often results in the agent deviating from the planned path or, in severe instances, the agent being trapped in obstacle areas and failing the navigational task. To address the above-mentioned issues, this paper investigates various collision scenarios within VLN-CE and proposes a classification method to predicate the underlying causes of collisions. Furthermore, a new VLN-CE algorithm, named Safe-VLN, is proposed to bolster collision avoidance capabilities including two key components, i.e., a waypoint predictor and a navigator. In particular, the waypoint predictor leverages a simulated 2D LiDAR occupancy mask to prevent the predicted waypoints from being situated in obstacle-ridden areas. The navigator, on the other hand, employs the strategy of `re-selection after collision' to prevent the robot agent from becoming ensnared in a cycle of perpetual collisions. The proposed Safe-VLN is evaluated on the R2R-CE, the results of which demonstrate an enhanced navigational performance and a statistically significant reduction in collision incidences.
FocusTune: Tuning Visual Localization through Focus-Guided Sampling
Authors: Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02872
Pdf link: https://arxiv.org/pdf/2311.02872
Abstract We propose FocusTune, a focus-guided sampling technique to improve the performance of visual localization algorithms. FocusTune directs a scene coordinate regression model towards regions critical for 3D point triangulation by exploiting key geometric constraints. Specifically, rather than uniformly sampling points across the image for training the scene coordinate regression model, we instead re-project 3D scene coordinates onto the 2D image plane and sample within a local neighborhood of the re-projected points. While our proposed sampling strategy is generally applicable, we showcase FocusTune by integrating it with the recently introduced Accelerated Coordinate Encoding (ACE) model. Our results demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements, for example reducing translation error from 25 to 19 and 17 to 15 cm for single and ensemble models, respectively, on the Cambridge Landmarks dataset. This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality. We made our code available at \url{https://github.com/sontung/focus-tune}.
Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study
Authors: Tom P. Huck, Martin Kaiser, Constantin Cronrath, Bengt Lennartson, Torsten Kröger, Tamim Asfour
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02907
Pdf link: https://arxiv.org/pdf/2311.02907
Abstract Safety-critical robot systems need thorough testing to expose design flaws and software bugs which could endanger humans. Testing in simulation is becoming increasingly popular, as it can be applied early in the development process and does not endanger any real-world operators. However, not all safety-critical flaws become immediately observable in simulation. Some may only become observable under certain critical conditions. If these conditions are not covered, safety flaws may remain undetected. Creating critical tests is therefore crucial. In recent years, there has been a trend towards using Reinforcement Learning (RL) for this purpose. Guided by domain-specific reward functions, RL algorithms are used to learn critical test strategies. This paper presents a case study in which the collision avoidance behavior of a mobile robot is subjected to RL-based testing. The study confirms prior research which shows that RL can be an effective testing tool. However, the study also highlights certain challenges associated with RL-based testing, namely (i) a possible lack of diversity in test conditions and (ii) the phenomenon of reward hacking where the RL agent behaves in undesired ways due to a misalignment of reward and test specification. The challenges are illustrated with data and examples from the experiments, and possible mitigation strategies are discussed.
Segmentation of Drone Collision Hazards in Airborne RADAR Point Clouds Using PointNet
Authors: Hector Arroyo, Paul Kier, Dylan Angus, Santiago Matalonga, Svetlozar Georgiev, Mehdi Goli, Gerard Dooly, James Riordan
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03221
Pdf link: https://arxiv.org/pdf/2311.03221
Abstract The integration of unmanned aerial vehicles (UAVs) into shared airspace for beyond visual line of sight (BVLOS) operations presents significant challenges but holds transformative potential for sectors like transportation, construction, energy and defense. A critical prerequisite for this integration is equipping UAVs with enhanced situational awareness to ensure safe operations. Current approaches mainly target single object detection or classification, or simpler sensing outputs that offer limited perceptual understanding and lack the rapid end-to-end processing needed to convert sensor data into safety-critical insights. In contrast, our study leverages radar technology for novel end-to-end semantic segmentation of aerial point clouds to simultaneously identify multiple collision hazards. By adapting and optimizing the PointNet architecture and integrating aerial domain insights, our framework distinguishes five distinct classes: mobile drones (DJI M300 and DJI Mini) and airplanes (Ikarus C42), and static returns (ground and infrastructure) which results in enhanced situational awareness for UAVs. To our knowledge, this is the first approach addressing simultaneous identification of multiple collision threats in an aerial setting, achieving a robust 94% accuracy. This work highlights the potential of radar technology to advance situational awareness in UAVs, facilitating safe and efficient BVLOS operations.
Machine Learning-Based Tea Leaf Disease Detection: A Comprehensive Review
Authors: Faruk Ahmed, Md. Taimur Ahad, Yousuf Rayhan Emon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.03240
Pdf link: https://arxiv.org/pdf/2311.03240
Abstract Tea leaf diseases are a major challenge to agricultural productivity, with far-reaching implications for yield and quality in the tea industry. The rise of machine learning has enabled the development of innovative approaches to combat these diseases. Early detection and diagnosis are crucial for effective crop management. For predicting tea leaf disease, several automated systems have already been developed using different image processing techniques. This paper delivers a systematic review of the literature on machine learning methodologies applied to diagnose tea leaf disease via image classification. It thoroughly evaluates the strengths and constraints of various Vision Transformer models, including Inception Convolutional Vision Transformer (ICVT), GreenViT, PlantXViT, PlantViT, MSCVT, Transfer Learning Model & Vision Transformer (TLMViT), IterationViT, IEM-ViT. Moreover, this paper also reviews models like Dense Convolutional Network (DenseNet), Residual Neural Network (ResNet)-50V2, YOLOv5, YOLOv7, Convolutional Neural Network (CNN), Deep CNN, Non-dominated Sorting Genetic Algorithm (NSGA-II), MobileNetv2, and Lesion-Aware Visual Transformer. These machine-learning models have been tested on various datasets, demonstrating their real-world applicability. This review study not only highlights current progress in the field but also provides valuable insights for future research directions in the machine learning-based detection and classification of tea leaf diseases.
On Asynchrony, Memory, and Communication: Separations and Landscapes
Authors: Paola Flocchini, Nicola Santoro, Yuichi Sudo, Koichi Wada
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.03328
Pdf link: https://arxiv.org/pdf/2311.03328
Abstract Research on distributed computing by a team of identical mobile computational entities, called robots, operating in a Euclidean space in $\mathit{Look}$-$\mathit{Compute}$-$\mathit{Move}$ ($\mathit{LCM}$) cycles, has recently focused on better understanding how the computational power of robots depends on the interplay between their internal capabilities (i.e., persistent memory, communication), captured by the four standard computational models (OBLOT, LUMI, FSTA, and FCOM) and the conditions imposed by the external environment, controlling the activation of the robots and their synchronization of their activities, perceived and modeled as an adversarial scheduler. We consider a set of adversarial asynchronous schedulers ranging from the classical {\em semi-synchronous} (SSYNCH) and {\em fully asynchronous} (ASYNCH) settings, including schedulers (emerging when studying the atomicity of the combination of operations in the $\mathit{LCM}$ cycles) whose adversarial power is in between those two. We ask the question: what is the computational relationship between a model $M_1$ under adversarial scheduler $K_1$ ($M_1(K_1)$) and a model $M_2$ under scheduler $K_2$ ($M_2(K_2)$)? For example, are the robots in $M_1(K_1)$ more powerful (i.e., they can solve more problems) than those in $M_2(K_2)$? We answer all these questions by providing, through cross-model analysis, a complete characterization of the computational relationship between the power of the four models of robots under the considered asynchronous schedulers. In this process, we also provide qualified answers to several open questions, including the outstanding one on the proper dominance of \ over \ASY\ in the case of unrestricted visibility.
Keyword: pruning

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Authors: Xuwei Xu, Sen Wang, Yudong Chen, Yanping Zheng, Zhewei Wei, Jiajun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03035
Pdf link: https://arxiv.org/pdf/2311.03035
Abstract Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.
Keyword: diffusion

Sparse Training of Discrete Diffusion Models for Graph Generation
Authors: Yiming Qin, Clement Vignac, Pascal Frossard
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02142
Pdf link: https://arxiv.org/pdf/2311.02142
Abstract Generative models for graphs often encounter scalability challenges due to the inherent need to predict interactions for every node pair. Despite the sparsity often exhibited by real-world graphs, the unpredictable sparsity patterns of their adjacency matrices, stemming from their unordered nature, leads to quadratic computational complexity. In this work, we introduce SparseDiff, a denoising diffusion model for graph generation that is able to exploit sparsity during its training phase. At the core of SparseDiff is a message-passing neural network tailored to predict only a subset of edges during each forward pass. When combined with a sparsity-preserving noise model, this model can efficiently work with edge lists representations of graphs, paving the way for scalability to much larger structures. During the sampling phase, SparseDiff iteratively populates the adjacency matrix from its prior state, ensuring prediction of the full graph while controlling memory utilization. Experimental results show that SparseDiff simultaneously matches state-of-the-art in generation performance on both small and large graphs, highlighting the versatility of our method.
Patch-based Selection and Refinement for Early Object Detection
Authors: Tianyi Zhang, Kishore Kasichainula, Yaoxin Zhuo, Baoxin Li, Jae-Sun Seo, Yu Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02274
Pdf link: https://arxiv.org/pdf/2311.02274
Abstract Early object detection (OD) is a crucial task for the safety of many dynamic systems. Current OD algorithms have limited success for small objects at a long distance. To improve the accuracy and efficiency of such a task, we propose a novel set of algorithms that divide the image into patches, select patches with objects at various scales, elaborate the details of a small object, and detect it as early as possible. Our approach is built upon a transformer-based network and integrates the diffusion model to improve the detection accuracy. As demonstrated on BDD100K, our algorithms enhance the mAP for small objects from 1.03 to 8.93, and reduce the data volume in computation by more than 77\%. The source code is available at \href{https://github.com/destiny301/dpr}{https://github.com/destiny301/dpr}
Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting
Authors: Hao Ai, Lu Sheng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02343
Pdf link: https://arxiv.org/pdf/2311.02343
Abstract Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.
From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models
Authors: Zhuoshi Pan, Yuguang Yao, Gaowen Liu, Bingquan Shen, H. Vicky Zhao, Ramana Rao Kompella, Sijia Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02373
Pdf link: https://arxiv.org/pdf/2311.02373
Abstract While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as `trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at https://github.com/OPTML-Group/BiBadDiff.
Numerical Recovery of a Time-Dependent Potential in Subdiffusion
Authors: Bangti Jin, Kwancheol Shin, Zhi Zhou
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2311.02420
Pdf link: https://arxiv.org/pdf/2311.02420
Abstract In this work we investigate an inverse problem of recovering a time-dependent potential in a semilinear subdiffusion model from an integral measurement of the solution over the domain. The model involves the Djrbashian--Caputo fractional derivative in time. Theoretically, we prove a novel conditional Lipschitz stability result, and numerically, we develop an easy-to-implement fixed point iteration for recovering the unknown coefficient. In addition, we establish rigorous error bounds on the discrete approximation. These results are obtained by crucially using smoothing properties of the solution operators and suitable choice of a weighted $L^p(0,T)$ norm. The efficiency and accuracy of the scheme are showcased on several numerical experiments in one- and two-dimensions.
SSL-DG: Rethinking and Fusing Semi-supervised Learning and Domain Generalization in Medical Image Segmentation
Authors: Zanting Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02583
Pdf link: https://arxiv.org/pdf/2311.02583
Abstract Deep learning-based medical image segmentation is an essential yet challenging task in clinical practice, which arises from restricted access to annotated data coupled with the occurrence of domain shifts. Previous attempts have focused on isolated solutions, while disregarding their inter-connectedness. In this paper, we rethink the relationship between semi-supervised learning (SSL) and domain generalization (DG), which are the cutting-edge approaches to address the annotated data-driven constraints and the domain shift issues. Inspired by class-level representation, we show that unseen target data can be represented by a linear combination of source data, which can be achieved by simple data augmentation. The augmented data enrich domain distributions while having semantic consistency, aligning with the principles of consistency-based SSL. Accordingly, we propose SSL-DG, fusing DG and SSL, to achieve cross-domain generalization with limited annotations. Specifically, the global and focal region augmentation, together with an augmentation scale-balancing mechanism, are used to construct a mask-based domain diffusion augmentation module to significantly enrich domain diversity. In order to obtain consistent predictions for the same source data in different networks, we use uncertainty estimation and a deep mutual learning strategy to enforce the consistent constraint. Extensive experiments including ablation studies are designed to validate the proposed SSL-DG. The results demonstrate that our SSL-DG significantly outperforms state-of-the-art solutions in two challenging DG tasks with limited annotations. Code is available at https://github.com/yezanting/SSL-DG.
Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion
Authors: Ethan Pronovost, Meghana Reddy Ganesina, Noureldin Hendy, Zeyu Wang, Andres Morales, Kai Wang, Nicholas Roy
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.02738
Pdf link: https://arxiv.org/pdf/2311.02738
Abstract Automated creation of synthetic traffic scenarios is a key part of validating the safety of autonomous vehicles (AVs). In this paper, we propose Scenario Diffusion, a novel diffusion-based architecture for generating traffic scenarios that enables controllable scenario generation. We combine latent diffusion, object detection and trajectory regression to generate distributions of synthetic agent poses, orientations and trajectories simultaneously. To provide additional control over the generated scenario, this distribution is conditioned on a map and sets of tokens describing the desired scenario. We show that our approach has sufficient expressive capacity to model diverse traffic patterns and generalizes to different geographical regions.
PermutEx: Feature-Extraction-Based Permutation -- A New Diffusion Scheme for Image Encryption Algorithms
Authors: Muhammad Shahbaz Khan, Jawad Ahmad, Ahmed Al-Dubai, Zakwan Jaroucheh, Nikolaos Pitropakis, William J. Buchanan
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.02795
Pdf link: https://arxiv.org/pdf/2311.02795
Abstract Traditional permutation schemes mostly focus on random scrambling of pixels, often neglecting the intrinsic image information that could enhance diffusion in image encryption algorithms. This paper introduces PermutEx, a feature-extraction-based permutation method that utilizes inherent image features to scramble pixels effectively. Unlike random permutation schemes, PermutEx extracts the spatial frequency and local contrast features of the image and ranks each pixel based on this information, identifying which pixels are more important or information-rich based on texture and edge information. In addition, a unique permutation key is generated using the Logistic-Sine Map based on chaotic behavior. The ranked pixels are permuted in conjunction with this unique key, effectively permuting the original image into a scrambled version. Experimental results indicate that the proposed method effectively disrupts the correlation in information-rich areas within the image resulting in a correlation value of 0.000062. The effective scrambling of pixels, resulting in nearly zero correlation, makes this method suitable to be used as diffusion in image encryption algorithms.
InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image
Authors: Jianhui Li, Shilong Liu, Zidong Liu, Yikai Wang, Kaiwen Zheng, Jinghui Xu, Jianmin Li, Jun Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02826
Pdf link: https://arxiv.org/pdf/2311.02826
Abstract With the success of Neural Radiance Field (NeRF) in 3D-aware portrait editing, a variety of works have achieved promising results regarding both quality and 3D consistency. However, these methods heavily rely on per-prompt optimization when handling natural language as editing instructions. Due to the lack of labeled human face 3D datasets and effective architectures, the area of human-instructed 3D-aware editing for open-world portraits in an end-to-end manner remains under-explored. To solve this problem, we propose an end-to-end diffusion-based framework termed InstructPix2NeRF, which enables instructed 3D-aware portrait editing from a single open-world image with human instructions. At its core lies a conditional latent 3D diffusion process that lifts 2D editing to 3D space by learning the correlation between the paired images' difference and the instructions via triplet data. With the help of our proposed token position randomization strategy, we could even achieve multi-semantic editing through one single pass with the portrait identity well-preserved. Besides, we further propose an identity consistency module that directly modulates the extracted identity signals into our diffusion process, which increases the multi-view 3D identity consistency. Extensive experiments verify the effectiveness of our method and show its superiority against strong baselines quantitatively and qualitatively.
Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video
Authors: Yanqin Jiang, Li Zhang, Jin Gao, Weimin Hu, Yao Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02848
Pdf link: https://arxiv.org/pdf/2311.02848
Abstract In this paper, we present Consistent4D, a novel approach for generating 4D dynamic objects from uncalibrated monocular videos. Uniquely, we cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration. This is achieved by leveraging the object-level 3D-aware image diffusion model as the primary supervision signal for training Dynamic Neural Radiance Fields (DyNeRF). Specifically, we propose a Cascade DyNeRF to facilitate stable convergence and temporal continuity under the supervision signal which is discrete along the time axis. To achieve spatial and temporal consistency, we further introduce an Interpolation-driven Consistency Loss. It is optimized by minimizing the discrepancy between rendered frames from DyNeRF and interpolated frames from a pre-trained video interpolation model. Extensive experiments show that our Consistent4D can perform competitively to prior art alternatives, opening up new possibilities for 4D dynamic object generation from monocular videos, whilst also demonstrating advantage for conventional text-to-3D generation tasks. Our project page is https://consistent4d.github.io/.
Sharp error analysis for averaging Crank-Nicolson schemes with corrections for subdiffusion with nonsmooth solutions
Authors: Baoli Yin, Yang Liu, Hong Li
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.02904
Pdf link: https://arxiv.org/pdf/2311.02904
Abstract Thanks to the singularity of the solution of linear subdiffusion problems, most time-stepping methods on uniform meshes can result in $O(\tau)$ accuracy where $\tau$ denotes the time step. The present work aims to discover the reason why some type of Crank-Nicolson schemes (the averaging Crank-Nicolson scheme) for the subdiffusion can only yield $O(\tau^\alpha)$$(\alpha<1)$ accuracy, which is much lower than the desired. The existing well developed error analysis for the subdiffusion, which has been successfully applied to many time-stepping methods such as the fractional BDF-$p (1\leq p\leq 6)$, all requires singular points be out of the path of contour integrals involved. The averaging Crank-Nicolson scheme in this work is quite natural but fails to meet this requirement. By resorting to the residue theorem, some novel sharp error analysis is developed in this study, upon which correction methods are further designed to obtain the optimal $O(\tau^2)$ accuracy. All results are verified by numerical tests.
Diffusion-based Radiotherapy Dose Prediction Guided by Inter-slice Aware Structure Encoding
Authors: Zhenghao Feng, Lu Wen, Jianghong Xiao, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Xingchen Peng, Yan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02991
Pdf link: https://arxiv.org/pdf/2311.02991
Abstract Deep learning (DL) has successfully automated dose distribution prediction in radiotherapy planning, enhancing both efficiency and quality. However, existing methods suffer from the over-smoothing problem for their commonly used L1 or L2 loss with posterior average calculations. To alleviate this limitation, we propose a diffusion model-based method (DiffDose) for predicting the radiotherapy dose distribution of cancer patients. Specifically, the DiffDose model contains a forward process and a reverse process. In the forward process, DiffDose transforms dose distribution maps into pure Gaussian noise by gradually adding small noise and a noise predictor is simultaneously trained to estimate the noise added at each timestep. In the reverse process, it removes the noise from the pure Gaussian noise in multiple steps with the well-trained noise predictor and finally outputs the predicted dose distribution maps...
Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting
Authors: Mikolaj Czerkawski, Christos Tachtatzis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03008
Pdf link: https://arxiv.org/pdf/2311.03008
Abstract The paper investigates the utility of text-to-image inpainting models for satellite image data. Two technical challenges of injecting structural guiding signals into the generative process as well as translating the inpainted RGB pixels to a wider set of MSI bands are addressed by introducing a novel inpainting framework based on StableDiffusion and ControlNet as well as a novel method for RGB-to-MSI translation. The results on a wider set of data suggest that the inpainting synthesized via StableDiffusion suffers from undesired artefacts and that a simple alternative of self-supervised internal inpainting achieves higher quality of synthesis.
AnyText: Multilingual Visual Text Generation And Editing
Authors: Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03054
Pdf link: https://arxiv.org/pdf/2311.03054
Abstract Diffusion model based Text-to-Image has achieved impressive achievements recently. Although current technology for synthesizing images is highly advanced and capable of generating images with high fidelity, it is still possible to give the show away when focusing on the text area in the generated image. To address this issue, we introduce AnyText, a diffusion-based multilingual visual text generation and editing model, that focuses on rendering accurate and coherent text in the image. AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy. AnyText can write characters in multiple languages, to the best of our knowledge, this is the first work to address multilingual visual text generation. It is worth mentioning that AnyText can be plugged into existing diffusion models from the community for rendering or editing text accurately. After conducting extensive evaluation experiments, our method has outperformed all other approaches by a significant margin. Additionally, we contribute the first large-scale multilingual text images dataset, AnyWord-3M, containing 3 million image-text pairs with OCR annotations in multiple languages. Based on AnyWord-3M dataset, we propose AnyText-benchmark for the evaluation of visual text generation accuracy and quality. Our project will be open-sourced on https://github.com/tyxsspa/AnyText to improve and promote the development of text generation technology.
Persistent homology for high-dimensional data based on spectral methods
Authors: Sebastian Damrich, Philipp Berens, Dmitry Kobak
Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT)
Arxiv link: https://arxiv.org/abs/2311.03087
Pdf link: https://arxiv.org/pdf/2311.03087
Abstract Persistent homology is a popular computational tool for detecting non-trivial topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case vanilla persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for most existing refinements of persistent homology. As a remedy, we find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow persistent homology to detect the correct topology even in the presence of high-dimensional noise. Furthermore, we derive a novel closed-form expression for effective resistance in terms of the eigendecomposition of the graph Laplacian, and describe its relation to diffusion distances. Finally, we apply these methods to several high-dimensional single-cell RNA-sequencing datasets and show that spectral distances on the $k$-nearest-neighbor graph allow robust detection of cell cycle loops.
LDM3D-VR: Latent Diffusion Model for 3D VR
Authors: Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.03226
Pdf link: https://arxiv.org/pdf/2311.03226
Abstract Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
TS-Diffusion: Generating Highly Complex Time Series with Diffusion Models
Authors: Yangming Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.03303
Pdf link: https://arxiv.org/pdf/2311.03303
Abstract While current generative models have achieved promising performances in time-series synthesis, they either make strong assumptions on the data format (e.g., regularities) or rely on pre-processing approaches (e.g., interpolations) to simplify the raw data. In this work, we consider a class of time series with three common bad properties, including sampling irregularities, missingness, and large feature-temporal dimensions, and introduce a general model, TS-Diffusion, to process such complex time series. Our model consists of three parts under the framework of point process. The first part is an encoder of the neural ordinary differential equation (ODE) that converts time series into dense representations, with the jump technique to capture sampling irregularities and self-attention mechanism to handle missing values; The second component of TS-Diffusion is a diffusion model that learns from the representation of time series. These time-series representations can have a complex distribution because of their high dimensions; The third part is a decoder of another ODE that generates time series with irregularities and missing values given their representations. We have conducted extensive experiments on multiple time-series datasets, demonstrating that TS-Diffusion achieves excellent results on both conventional and complex time series and significantly outperforms previous baselines.
Keyword: adaptive

PILL: Plug Into LLM with Adapter Expert and Attention Gate
Authors: Fangyuan Zhang, Tingting Liang, Zhengyuan Wu, Yuyu Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02126
Pdf link: https://arxiv.org/pdf/2311.02126
Abstract Due to the remarkable capabilities of powerful Large Language Models (LLMs) in effectively following instructions, there has been a growing number of assistants in the community to assist humans. Recently, significant progress has been made in the development of Vision Language Models (VLMs), expanding the capabilities of LLMs and enabling them to execute more diverse instructions. However, it is foreseeable that models will likely need to handle tasks involving additional modalities such as speech, video, and others. This poses a particularly prominent challenge of dealing with the complexity of mixed modalities. To address this, we introduce a novel architecture called PILL: Plug Into LLM with adapter expert and attention gate to better decouple these complex modalities and leverage efficient fine-tuning. We introduce two modules: Firstly, utilizing Mixture-of-Modality-Adapter-Expert to independently handle different modalities, enabling better adaptation to downstream tasks while preserving the expressive capability of the original model. Secondly, by introducing Modality-Attention-Gating, which enables adaptive control of the contribution of modality tokens to the overall representation. In addition, we have made improvements to the Adapter to enhance its learning and expressive capabilities. Experimental results demonstrate that our approach exhibits competitive performance compared to other mainstream methods for modality fusion. For researchers interested in our work, we provide free access to the code and models at https://github.com/DsaltYfish/PILL.
Joint Composite Latent Space Bayesian Optimization
Authors: Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, Eytan Bakshy
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02213
Pdf link: https://arxiv.org/pdf/2311.02213
Abstract Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.
Software in P2P way: a software model without central software and enabling any software to join or leave freely
Authors: Hong Su
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.02351
Pdf link: https://arxiv.org/pdf/2311.02351
Abstract The P2P model encompasses a network of equal peers, whether in hardware or software, operating autonomously without central control, allowing individual peer failure while ensuring high availability. Nevertheless, current P2P technologies primarily focus on hardware-level resilience, often referred to as P2P networks, which do not safeguard against software failures. This paper introduces a pioneering Peer-to-Peer (P2P) software model aimed at enhancing software-level high availability. Diverging from prevalent hardware-centric P2P technologies, this model accentuates the decentralized nature of various software components, or "software peers," which function independently, enabling seamless network entry and exit without relying on central software. The model's collaborative approach cultivates a network topology with multiple autonomous processing paths, ensuring continuous operation through dynamic task allocation in a distributed manner. By surpassing the limitations of traditional redundancy methods, this P2P model provides an adaptive and scalable solution for achieving robust availability. Validation results underscore the model's effectiveness in enhancing the probabilities of successful task processing while ensuring high availability.
EU COST Action on future generation optical wireless communication technologies, 2nd White paper
Authors: Z. Ghassemlooy, M. A. Khalighi, S. Zvanovec, A. Shrestha, B. Ortega, M. Petkovic, X. Pang, C. Sirtori, D. Orsucci, A. Shrestha, F. Moll, G. Cossu, V. Spirito, M. P. Ninos, E. Ciaramella, J. Bas, M. Amay, S. Huang, M. Safari, T. Gutema, W. Popoola, Vicente Matus, Jose Rabadan, Rafael Perez-Jimenez, E. Panayirci, P. D. Diamantoulakis, H. Haas, I. C. Ijeh
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP); Optics (physics.optics); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2311.02511
Pdf link: https://arxiv.org/pdf/2311.02511
Abstract NEWFOCUS is an EU COST Action targeted at exploring radical solutions that could influence the design of future wireless networks. The project aims to address some of the challenges associated with optical wireless communication (OWC) and to establish it as a complementary technology to the radio frequency (RF)-based wireless systems in order to meet the demanding requirements of the fifth generation (5G) and the future sixth generation (6G) backhaul and access networks. Only 6G will be able to widely serve the exponential growth in connected devices (i.e., more than 500 billion) in 2030, real-time holographic communication, future virtual reality, etc. Space is emerging as the new frontier in 5 and 6G and beyond communication networks, where it offers high-speed wireless coverage to remote areas both in lands and sees. This activity is supported by the recent development of low-altitude Earth orbit satellite mega-constellations. The focus of this 2nd White Paper is on the use of OWC as an enabling technology for medium- and long-range links for deployment in (i) smart-cities and intelligent transportation systems; (ii) first- and last-mile access and backhaul/fronthaul wireless networks; (iii) hybrid free-space optics/RF adaptive wireless connections; (iv) space-to-ground, inter-satellite, ground-to-air, and air-to-air communications; and (v) underwater communications.
Group Testing for Accurate and Efficient Range-Based Near Neighbor Search : An Adaptive Binary Splitting Approach
Authors: Kashish Mittal, Harsh Shah, Ajit Rajwade
Subjects: Data Structures and Algorithms (cs.DS); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.02573
Pdf link: https://arxiv.org/pdf/2311.02573
Abstract This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem. The proposed method detects high-similarity vectors from an extensive collection of high dimensional vectors, where each vector represents an image descriptor. Our method efficiently marks each item in the collection as neighbor or non-neighbor on the basis of a cosine distance threshold without exhaustive search. Like other methods in the domain of large scale retrieval, our approach exploits the assumption that most of the items in the collection are unrelated to the query. Unlike other methods, it does not assume a large difference between the cosine similarity of the query vector with the least related neighbor and that with the least unrelated non-neighbor. Following the procedure of binary splitting, a multi-stage adaptive group testing algorithm, we split the set of items to be searched into half at each step, and perform dot product tests on smaller and smaller subsets, many of which we are able to prune away. We experimentally show that our method achieves a speed-up over exhaustive search by a factor of more than ten with an accuracy same as that of exhaustive search, on a variety of large datasets. We present a theoretical analysis of the expected number of distance computations per query and the probability that a pool with a certain number of members will be pruned. In this way, our method exploits very useful and practical distributional properties unlike other methods. In our method, all required data structures are created purely offline. Moreover, our method does not impose any strong assumptions on the number of true near neighbors, is adaptible to streaming settings where new vectors are dynamically added to the database, and does not require any parameter tuning.
RecAGT: Shard Testable Codes with Adaptive Group Testing for Malicious Nodes Identification in Sharding Permissioned Blockchain
Authors: Dongyang Yu, Jin Wang, Lingzhi Li, Wei Jiang, Can Liu
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.02582
Pdf link: https://arxiv.org/pdf/2311.02582
Abstract Recently, permissioned blockchain has been extensively explored in various fields, such as asset management, supply chain, healthcare, and many others. Many scholars are dedicated to improving its verifiability, scalability, and performance based on sharding techniques, including grouping nodes and handling cross-shard transactions. However, they ignore the node vulnerability problem, i.e., there is no guarantee that nodes will not be maliciously controlled throughout their life cycle. Facing this challenge, we propose RecAGT, a novel identification scheme aimed at reducing communication overhead and identifying potential malicious nodes. First, shard testable codes are designed to encode the original data in case of a leak of confidential data. Second, a new identity proof protocol is presented as evidence against malicious behavior. Finally, adaptive group testing is chosen to identify malicious nodes. Notably, our work focuses on the internal operation within the committee and can thus be applied to any sharding permissioned blockchains. Simulation results show that our proposed scheme can effectively identify malicious nodes with low communication and computational costs.
A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery
Authors: Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.02631
Pdf link: https://arxiv.org/pdf/2311.02631
Abstract The trajectory on the road traffic is commonly collected at a low sampling rate, and trajectory recovery aims to recover a complete and continuous trajectory from the sparse and discrete inputs. Recently, sequential language models have been innovatively adopted for trajectory recovery in a pre-trained manner: it learns road segment representation vectors, which will be used in the downstream tasks. However, existing methods are incapable of handling complex trajectories: when the trajectory crosses remote road segments or makes several turns, which we call critical nodes, the quality of learned representations deteriorates, and the recovered trajectories skip the critical nodes. This work is dedicated to offering a more robust trajectory recovery for complex trajectories. Firstly, we define the trajectory complexity based on the detour score and entropy score and construct the complexity-aware semantic graphs correspondingly. Then, we propose a Multi-view Graph and Complexity Aware Transformer (MGCAT) model to encode these semantics in trajectory pre-training from two aspects: 1) adaptively aggregate the multi-view graph features considering trajectory pattern, and 2) higher attention to critical nodes in a complex trajectory. Such that, our MGCAT is perceptual when handling the critical scenario of complex trajectories. Extensive experiments are conducted on large-scale datasets. The results prove that our method learns better representations for trajectory recovery, with 5.22% higher F1-score overall and 8.16% higher F1-score for complex trajectories particularly. The code is available at https://github.com/bonaldli/ComplexTraj.
PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation
Authors: Sahil Nawale, Dhruv Khut, Daksh Dave, Gauransh Sawhney, Pushkar Aggrawal, Dr. Kailas Devadakar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02641
Pdf link: https://arxiv.org/pdf/2311.02641
Abstract Pothole detection is crucial for road safety and maintenance, traditionally relying on 2D image segmentation. However, existing 3D Semantic Pothole Segmentation research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy. Our research presents an innovative point cloud-based pothole segmentation architecture. Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics, improving feature presentation. We introduce a local relationship learning module to understand local shape relationships, enhancing structural insights. Additionally, we propose a lightweight adaptive structure for refining local point features using the K nearest neighbor algorithm, addressing point cloud density differences and domain selection. Shared MLP Pooling is integrated to learn deep aggregation features, facilitating semantic data exploration and segmentation guidance. Extensive experiments on three public datasets confirm PotholeGuard's superior performance over state-of-the-art methods. Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.
Region of Interest (ROI) based adaptive cross-layer system for real-time video streaming over Vehicular Ad-hoc NETworks (VANETs)
Authors: Mohamed Aymen Labiod, Mohamed Gharbi, François-Xavier Coudoux, Patrick Corlay
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2311.02656
Pdf link: https://arxiv.org/pdf/2311.02656
Abstract Nowadays, real-time vehicle applications increasingly rely on video acquisition and processing to detect or even identify vehicles and obstacles in the driving environment. In this letter, we propose an algorithm that allows reinforcing these operations by improving end-to-end video transmission quality in a vehicular context. The proposed low complexity solution gives highest priority to the scene regions of interest (ROI) on which the perception of the driving environment is based on. This is done by applying an adaptive cross-layer mapping of the ROI visual data packets at the IEEE 802.11p MAC layer. Realistic VANET simulation results demonstrate that for HEVC compressed video communications, the proposed system offers PSNR gains up to 11dB on the ROI part.
Solving High Dimensional Partial Differential Equations Using Tensor Neural Network and A Posteriori Error Estimators
Authors: Yifan Wang, Zhongshuo Lin, Yangfei Liao, Haochen Liu, Hehu Xie
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.02732
Pdf link: https://arxiv.org/pdf/2311.02732
Abstract In this paper, we first propose a new type of tensor neural network and the corresponding machine learning method to solve high-dimensional boundary value problems with Dirichlet or Neumann type of boundary conditions and eigenvalue problems of the second order elliptic operator. The most important advantage of the proposed network is that when calculating the loss function, the high dimensional integration can be computed with high accuracy using fixed quadrature points within tolerable computational complexity. Based on the theory of a posteriori error estimation, a machine learning method which use a posteriori error estimator as the loss function is designed to select optimal network parameters adaptively. The theoretical analysis and numerical examples are provided to validate the proposed methods.
Run-to-Run Adaptive Nonlinear Feedforward Control of Electromechanical Switching Devices
Authors: Eduardo Moya-Lasheras (1), Edgar Ramirez-Laboreo (1), Eloy Serrano-Seco (1) ((1) Universidad de Zaragoza)
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.02756
Pdf link: https://arxiv.org/pdf/2311.02756
Abstract Feedforward control can greatly improve the response time and control accuracy of any mechatronic system. However, in order to compensate for the effects of modeling errors or disturbances, it is imperative that this type of control works in conjunction with some form of feedback. In this paper, we present a new adaptive feedforward control scheme for electromechanical systems in which real-time measurements or estimates of the position and its derivatives are not technically or economically feasible. This is the case, for example, of commercial electromechanical switching devices such as solenoid actuators. Our proposal consists of two blocks: on the one hand, a feedforward controller based on differential flatness theory; on the other, an iterative adaptation law that exploits the repetitive operation of these devices to modify the controller parameters cycle by cycle. As shown, this law can be fed with any available measurement of the system, with the only requirement that it can be processed and converted into an indicator of the performance of any given operation. Simulated and experimental results show that our proposal is effective in dealing with a long-standing control problem in electromechanics: the soft-landing control of electromechanical switching devices.
APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation
Authors: Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.02816
Pdf link: https://arxiv.org/pdf/2311.02816
Abstract The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance. Therefore, previous works attempt to tackle this problem with a global collaborative item graph constructed by pre-defined rules. However, these methods neglect two crucial properties when capturing global collaborative information, i.e., adaptiveness and personalization, yielding sub-optimal user representations. To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems. Specifically, we first learn an adaptive global graph among all items and capture global collaborative information with it in a self-supervised fashion, whose computational burden can be further alleviated by the proposed SVD-based accelerator. Furthermore, based on the graph, we propose to extract and utilize personalized item correlations in the form of relative positional encoding, which is a highly compatible manner of personalizing the utilization of global collaborative information. Finally, the entire framework is optimized in a multi-task learning paradigm, thus each part of APGL4SR can be mutually reinforced. As a generic framework, APGL4SR can outperform other baselines with significant margins. The code is available at https://github.com/Graph-Team/APGL4SR.
Signal Processing Meets SGD: From Momentum to Filter
Authors: Zhipeng Yao, Guisong Chang, Jiaqi Zhang, Qi Zhang, Yu Zhang, Dazhou Li
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.02818
Pdf link: https://arxiv.org/pdf/2311.02818
Abstract In the field of deep learning, Stochastic Gradient Descent (SGD) and its momentum-based variants are the predominant choices for optimization algorithms. Despite all that, these momentum strategies, which accumulate historical gradients by using a fixed $\beta$ hyperparameter to smooth the optimization processing, often neglect the potential impact of the variance of historical gradients on the current gradient estimation. In the gradient variance during training, fluctuation indicates the objective function does not meet the Lipschitz continuity condition at all time, which raises the troublesome optimization problem. This paper aims to explore the potential benefits of reducing the variance of historical gradients to make optimizer converge to flat solutions. Moreover, we proposed a new optimization method based on reducing the variance. We employed the Wiener filter theory to enhance the first moment estimation of SGD, notably introducing an adaptive weight to optimizer. Specifically, the adaptive weight dynamically changes along with temporal fluctuation of gradient variance during deep learning model training. Experimental results demonstrated our proposed adaptive weight optimizer, SGDF (Stochastic Gradient Descent With Filter), can achieve satisfactory performance compared with state-of-the-art optimizers.
Incremental Approximate Maximum Flow on Undirected Graphs in Subpolynomial Update Time
Authors: Jan van den Brand, Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg, Sushant Sachdeva, Aaron Sidford
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.03174
Pdf link: https://arxiv.org/pdf/2311.03174
Abstract We provide an algorithm which, with high probability, maintains a $(1-\epsilon)$-approximate maximum flow on an undirected graph undergoing $m$-edge additions in amortized $m^{o(1)} \epsilon^{-3}$ time per update. To obtain this result, we provide a more general algorithm that solves what we call the incremental, thresholded $p$-norm flow problem that asks to determine the first edge-insertion in an undirected graph that causes the minimum $\ell_p$-norm flow to decrease below a given threshold in value. Since we solve this thresholded problem, our data structure succeeds against an adaptive adversary that can only see the data structure's output. Furthermore, since our algorithm holds for $p = 2$, we obtain improved algorithms for dynamically maintaining the effective resistance between a pair of vertices in an undirected graph undergoing edge insertions. Our algorithm builds upon previous dynamic algorithms for approximately solving the minimum-ratio cycle problem that underlie previous advances on the maximum flow problem [Chen-Kyng-Liu-Peng-Probst Gutenberg-Sachdeva, FOCS '22] as well as recent dynamic maximum flow algorithms [v.d.Brand-Liu-Sidford, STOC '23]. Instead of using interior point methods, which were a key component of these recent advances, our algorithm uses an optimization method based on $\ell_p$-norm iterative refinement and the multiplicative weight update method. This ensures a monotonicity property in the minimum-ratio cycle subproblems that allows us to apply known data structures and bypass issues arising from adaptive queries.
DeepInception: Hypnotize Large Language Model to Be Jailbreaker
Authors: Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, Bo Han
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.03191
Pdf link: https://arxiv.org/pdf/2311.03191
Abstract Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous studies for jailbreaks usually resort to brute-force optimization or extrapolations of a high computation cost, which might not be practical or effective. In this paper, inspired by the Milgram experiment that individuals can harm another person if they are told to do so by an authoritative figure, we disclose a lightweight method, termed as DeepInception, which can easily hypnotize LLM to be a jailbreaker and unlock its misusing risks. Specifically, DeepInception leverages the personification ability of LLM to construct a novel nested scene to behave, which realizes an adaptive way to escape the usage control in a normal scenario and provides the possibility for further direct jailbreaks. Empirically, we conduct comprehensive experiments to show its efficacy. Our DeepInception can achieve competitive jailbreak success rates with previous counterparts and realize a continuous jailbreak in subsequent interactions, which reveals the critical weakness of self-losing on both open/closed-source LLMs like Falcon, Vicuna, Llama-2, and GPT-3.5/4/4V. Our investigation appeals that people should pay more attention to the safety aspects of LLMs and a stronger defense against their misuse risks. The code is publicly available at: https://github.com/tmlr-group/DeepInception.
Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies
Authors: Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.03233
Pdf link: https://arxiv.org/pdf/2311.03233
Abstract In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: Investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of compute. This leads to the notion of a "compute-optimal" model, i.e. a model that allocates a given level of compute during training optimally to maximise performance. In this work, we extend the concept of optimality by allowing for an "adaptive" model, i.e. a model that can change its shape during the course of training. By allowing the shape to adapt, we can optimally traverse between the underlying scaling laws, leading to a significant reduction in the required compute to reach a given target performance. We focus on vision tasks and the family of Vision Transformers, where the patch size as well as the width naturally serve as adaptive shape parameters. We demonstrate that, guided by scaling laws, we can design compute-optimal adaptive models that beat their "static" counterparts.
Keyword: quantization

Quantized-but-uncoded Distributed Detection (QDD) with Unreliable Reporting Channels
Authors: Lei Cao, Ramanarayanan Viswanathan
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.02447
Pdf link: https://arxiv.org/pdf/2311.02447
Abstract Distributed detection primarily centers around two approaches: Unquantized Distributed Detection (UDD), where each sensor reports its complete observation to the fusion center (FC), and quantized-and-Coded DD (CDD), where each sensor first partitions the observation space and then reports to the FC a codeword. In this paper, we introduce Quantized-but-uncoded DD (QDD), where each sensor, after quantization, transmits a summarized value, instead of a codeword, to the FC. We show that QDD well adapts to the constraint of transmission power when compared to CDD, albeit with increased complexity in parameter selection. Moreover, we establish that, in the presence of independent observations, QDD upholds a necessary condition inherent in CDD. Specifically, the optimal sensor decision rules are the likelihood ratio quantizers (LRQ), irrelevant to the channel conditions. In the context of a single-sensor scenario involving binary decision at the sensor, we find that the optimal sensor rule in QDD is in general no longer ``channel blind", a feature presented in CDD. In addition, we compare these systems numerically under the same transmission power and bandwidth, while assuming additive white Gaussian noise (AWGN) in both sensing and reporting stages. Finally, we present some potential directions for future research.
Yet Another Generative Model For Room Impulse Response Estimation
Authors: Sungho Lee, Hyeong-Seok Choi, Kyogu Lee
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.02581
Pdf link: https://arxiv.org/pdf/2311.02581
Abstract Recent neural room impulse response (RIR) estimators typically comprise an encoder for reference audio analysis and a generator for RIR synthesis. Especially, it is the performance of the generator that directly influences the overall estimation quality. In this context, we explore an alternate generator architecture for improved performance. We first train an autoencoder with residual quantization to learn a discrete latent token space, where each token represents a small time-frequency patch of the RIR. Then, we cast the RIR estimation problem as a reference-conditioned autoregressive token generation task, employing transformer variants that operate across frequency, time, and quantization depth axes. This way, we address the standard blind estimation task and additional acoustic matching problem, which aims to find an RIR that matches the source signal to the target signal's reverberation characteristics. Experimental results show that our system is preferable to other baselines across various evaluation metrics.
M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs
Authors: Yuzong Chen, Jordan Dotzel, Mohamed S. Abdelfattah
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2311.02758
Pdf link: https://arxiv.org/pdf/2311.02758
Abstract Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs given the current FPGA architecture and conventional accelerator dataflows. In this work, we enhance the FPGA's capability for accelerating mixed-precision DNNs by proposing M4BRAM, a novel compute-in-block RAM (BRAM) architecture that can compute mixed-precision matrix-matrix multiplication. On the precision side, M4BRAM supports a wide range of mixed-precision DNN configurations -- the weight precision can be 2/4/8 bits while the activation precision can vary from 2 to 8 bits. On the dataflow side, M4BRAM leverages a novel in-BRAM data duplication scheme to achieve high hardware utilization. Moreover, during M4BRAM computation, other FPGA resources can seamlessly access its data without the need for a separate buffer. Hence, unlike prior compute-in-BRAM proposals, M4BRAM can simultaneously perform mixed-precision computation and maintain full functionality as a memory unit to \textit{truly} complement the existing compute resources on FPGAs. Experiments show that adding M4BRAM to a tiled DNN accelerator can achieve an average speedup of 2.16$\times$ across various DNNs on the ImageNet classification task while incurring a negligible accuracy loss of $<$ 0.5%. Compared to the same tiled accelerator that employs a prior compute-in-BRAM architecture, M4BRAM delivers 1.43$\times$ higher performance on average across various DNNs.
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Authors: Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.02772
Pdf link: https://arxiv.org/pdf/2311.02772
Abstract In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing convolutional modules with self-attention modules. They achieve state-of-the-art performance on ASR with top efficiency. We first show that employing these speech transformers as an encoder significantly improves the efficiency of pre-trained audio models as well. However, our study shows that we can achieve comparable efficiency with advanced self-attention solely. We demonstrate that this simpler approach is particularly beneficial with a low-bit weight quantization technique of a neural network to improve efficiency. We hypothesize that it prevents propagating the errors between different quantized modules compared to recent speech transformers mixing quantized convolution and the quantized self-attention modules.
Learned layered coding for Successive Refinement in the Wyner-Ziv Problem
Authors: Boris Joukovsky, Brent De Weerdt, Nikos Deligiannis
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.03061
Pdf link: https://arxiv.org/pdf/2311.03061
Abstract We propose a data-driven approach to explicitly learn the progressive encoding of a continuous source, which is successively decoded with increasing levels of quality and with the aid of correlated side information. This setup refers to the successive refinement of the Wyner-Ziv coding problem. Assuming ideal Slepian-Wolf coding, our approach employs recurrent neural networks (RNNs) to learn layered encoders and decoders for the quadratic Gaussian case. The models are trained by minimizing a variational bound on the rate-distortion function of the successively refined Wyner-Ziv coding problem. We demonstrate that RNNs can explicitly retrieve layered binning solutions akin to scalable nested quantization. Moreover, the rate-distortion performance of the scheme is on par with the corresponding monolithic Wyner-Ziv coding approach and is close to the rate-distortion bound.

A-suozhang / GetArxivDaily

New submissions for Tue, 7 Nov 23 #194

Keyword: efficient

Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model

MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language

LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking

Efficient Symbolic Policy Learning with Differentiable Symbolic Expression

Feature Attribution Explanations for Spiking Neural Networks

PILL: Plug Into LLM with Adapter Expert and Attention Gate

Resource savings from fault-tolerant circuit design

Sparse Training of Discrete Diffusion Models for Graph Generation

FairSeg: A Large-scale Medical Image Segmentation Dataset for Fairness Learning with Fair Error-Bound Scaling

Imitation Bootstrapped Reinforcement Learning

Joint Composite Latent Space Bayesian Optimization

Linear difference operators with sequence coefficients having infinite-dimentional solution spaces

On the dimension of the solution space of linear difference equations over the ring of infinite sequences

Structured Neural Networks for Density Estimation and Causal Inference

State-wise Safe Reinforcement Learning With Pixel Observations

Using DUCK-Net for Polyp Image Segmentation

Democratic Policy Development using Collective Dialogues and AI

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Comparative Knowledge Distillation

Not all layers are equally as important: Every Layer Counts BERT

Contrastive Multi-Modal Representation Learning for Spark Plug Fault Diagnosis

OverHear: Headphone based Multi-sensor Keystroke Inference

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

Thermal Face Image Classification using Deep Learning Techniques

An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations

NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation

Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

A Comprehensive Dynamic Simulation Framework for Coupled Neuromusculoskeletal-Exoskeletal Systems

MATA: Combining Learnable Node Matching with A Algorithm for Approximate Graph Edit Distance Computation

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Ultra-Long Sequence Distributed Transformer

The Case of Transparent Cache Invalidation in Web Applications

CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models

Numerical Recovery of a Time-Dependent Potential in Subdiffusion

Succinct Data Structure for Graphs with $d$-Dimensional $t$-Representation

P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification

SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing

Contract Design With Safety Inspections

VR-NeRF: High-Fidelity Virtualized Walkable Spaces

Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks

Ego-Network Transformer for Subsequence Classification in Time Series Data

Group Testing for Accurate and Efficient Range-Based Near Neighbor Search : An Adaptive Binary Splitting Approach

AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

Compute at Scale -- A Broad Investigation into the Data Center Industry

Patterned non-determinism in communication complexity

CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Nepali Video Captioning using CNN-RNN Architecture

Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

M4BRAM: Mixed-Precision Matrix-Matrix Multiplication in FPGA Block RAMs

One-Shot Strategic Classification Under Unknown Costs

Riemannian Laplace Approximation with the Fisher Metric

MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

Last fall degree of semi-local polynomial systems

Contour Algorithm for Connectivity

CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval

Cell-Probe Lower Bound for Accessible Interval Graphs

Generate Complete Logging Statements with an Efficient End-to-End Approach

Energy-Efficient Multidimensional Constellation Based on Leech Lattice for Visible Light Communications

Lightweight equivariant interaction graph neural network for accurate and efficient interatomic potential and force predictions

OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data

Virtual Action Actor-Critic Framework for Exploration (Student Abstract)

Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things

Design, implementation, and validation of a benchmark generator for combinatorial interaction testing tools

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

Maximal Consistent Subsystems of Max-T Fuzzy Relational Equations

SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet

A Simple yet Efficient Ensemble Approach for AI-generated Text Detection

Pelvic floor MRI segmentation based on semi-supervised deep learning

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

Animating NeRFs from Texture Space: A Framework for Pose-Dependent Rendering of Human Performances