New submissions for Fri, 21 Jul 23

Keyword: efficient

A Lightweight Approach for Network Intrusion Detection based on Self-Knowledge Distillation

Authors: Shuo Yang, Xinran Zheng, Zhengzhuo Xu, Xingjun Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10191
Pdf link: https://arxiv.org/pdf/2307.10191
Abstract Network Intrusion Detection (NID) works as a kernel technology for the security network environment, obtaining extensive research and application. Despite enormous efforts by researchers, NID still faces challenges in deploying on resource-constrained devices. To improve detection accuracy while reducing computational costs and model storage simultaneously, we propose a lightweight intrusion detection approach based on self-knowledge distillation, namely LNet-SKD, which achieves the trade-off between accuracy and efficiency. Specifically, we carefully design the DeepMax block to extract compact representation efficiently and construct the LNet by stacking DeepMax blocks. Furthermore, considering compensating for performance degradation caused by the lightweight network, we adopt batch-wise self-knowledge distillation to provide the regularization of training consistency. Experiments on benchmark datasets demonstrate the effectiveness of our proposed LNet-SKD, which outperforms existing state-of-the-art techniques with fewer parameters and lower computation loads.
Capsule network with shortcut routing
Authors: Dang Thanh Vu, Vo Hoang Trong, Yu Gwang-Hyun, Kim Jin-Young
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10212
Pdf link: https://arxiv.org/pdf/2307.10212
Abstract This study introduces "shortcut routing," a novel routing mechanism in capsule networks that addresses computational inefficiencies by directly activating global capsules from local capsules, eliminating intermediate layers. An attention-based approach with fuzzy coefficients is also explored for improved efficiency. Experimental results on Mnist, smallnorb, and affNist datasets show comparable classification performance, achieving accuracies of 99.52%, 93.91%, and 89.02% respectively. The proposed fuzzy-based and attention-based routing methods significantly reduce the number of calculations by 1.42 and 2.5 times compared to EM routing, highlighting their computational advantages in capsule networks. These findings contribute to the advancement of efficient and accurate hierarchical pattern representation models.
Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge
Authors: Zifeng Ding, Jingcheng Wu, Jingpei Wu, Yan Xia, Volker Tresp
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10219
Pdf link: https://arxiv.org/pdf/2307.10219
Abstract Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. In the meantime, due to the ever-evolving nature of world knowledge, extensive parallel works have been focusing on reasoning over temporal KGs (TKGs), where each TKG fact can be viewed as a KG fact coupled with a timestamp (or time period) specifying its time validity. The existing HKG reasoning approaches do not consider temporal information because it is not explicitly specified in previous benchmark datasets. Besides, all the previous TKG reasoning methods only lay emphasis on temporal reasoning and have no way to learn from qualifiers. To this end, we aim to fill the gap between TKG reasoning and HKG reasoning. We develop two new benchmark hyper-relational TKG (HTKG) datasets, i.e., Wiki-hy and YAGO-hy, and propose a HTKG reasoning model that efficiently models both temporal facts and qualifiers. We further exploit additional time-invariant relational knowledge from the Wikidata knowledge base and study its effectiveness in HTKG reasoning. Time-invariant relational knowledge serves as the knowledge that remains unchanged in time (e.g., Sasha Obama is the child of Barack Obama), and it has never been fully explored in previous TKG reasoning benchmarks and approaches. Experimental results show that our model substantially outperforms previous related methods on HTKG link prediction and can be enhanced by jointly leveraging both temporal and time-invariant relational knowledge.
Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors
Authors: Dongning Ma, Xun Jiao, Fred Lin, Mengshi Zhang, Alban Desmaison, Thomas Sellinger, Daniel Moore, Sriram Sankar
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10244
Pdf link: https://arxiv.org/pdf/2307.10244
Abstract Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.
Efficient selective attention LSTM for well log curve synthesis
Authors: Yuankai Zhou, Huanyu Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10253
Pdf link: https://arxiv.org/pdf/2307.10253
Abstract Non-core drilling has gradually become the primary exploration method in geological engineering, and well logging curves have increasingly gained importance as the main carriers of geological information. However, factors such as geological environment, logging equipment, borehole quality, and unexpected events can all impact the quality of well logging curves. Previous methods of re-logging or manual corrections have been associated with high costs and low efficiency. This paper proposes a machine learning method that utilizes existing data to predict missing well logging curves, and its effectiveness and feasibility have been validated through experiments. The proposed method builds upon the traditional Long Short-Term Memory (LSTM) neural network by incorporating a self-attention mechanism to analyze the spatial dependencies of the data. It selectively includes the dominant computational results in the LSTM, reducing the computational complexity from O(n^2) to O(nlogn) and improving model efficiency. Experimental results demonstrate that the proposed method achieves higher accuracy compared to traditional curve synthesis methods based on Fully Connected Neural Networks (FCNN) and LSTM. This accurate, efficient, and cost-effective prediction method holds practical value in engineering applications.
Hidden Markov Models with Random Restarts vs Boosting for Malware Detection
Authors: Aditya Raghavan, Fabio Di Troia, Mark Stamp
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10256
Pdf link: https://arxiv.org/pdf/2307.10256
Abstract Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult "cold start" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild
Authors: Raiyan Rahman, Christopher Indris, Tianxiao Zhang, Kaidong Li, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10267
Pdf link: https://arxiv.org/pdf/2307.10267
Abstract Aphid infestations can cause extensive damage to wheat and sorghum fields and spread plant viruses, resulting in significant yield losses in agriculture. To address this issue, farmers often rely on chemical pesticides, which are inefficiently applied over large areas of fields. As a result, a considerable amount of pesticide is wasted on areas without pests, while inadequate amounts are applied to areas with severe infestations. The paper focuses on the urgent need for an intelligent autonomous system that can locate and spray infestations within complex crop canopies, reducing pesticide use and environmental impact. We have collected and labeled a large aphid image dataset in the field, and propose the use of real-time semantic segmentation models to segment clusters of aphids. A multiscale dataset is generated to allow for learning the clusters at different scales. We compare the segmentation speeds and accuracy of four state-of-the-art real-time semantic segmentation models on the aphid cluster dataset, benchmarking them against nonreal-time models. The study results show the effectiveness of a real-time solution, which can reduce inefficient pesticide use and increase crop yields, paving the way towards an autonomous pest detection system.
The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination
Authors: Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, Shyam Narayanan
Subjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2307.10273
Pdf link: https://arxiv.org/pdf/2307.10273
Abstract We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing $\mathcal{N}(0,I)$ against $\mathcal{N}(\alpha v, I)$, where $|v|_2 = 1$, with an $\varepsilon$-fraction of adversarial corruptions, to be [ \tilde{\Theta}!\left(\max\left(\frac{\sqrt{d}}{\alpha^2}, \frac{d\varepsilon^3}{\alpha^4},\min\left(\frac{d^{2/3}\varepsilon^{2/3}}{\alpha^{8/3}}, \frac{d \varepsilon}{\alpha^2}\right)\right) \right) \,, ] while the complexity against adaptive adversaries is [ \tilde{\Theta}!\left(\max\left(\frac{\sqrt{d}}{\alpha^2}, \frac{d\varepsilon^2}{\alpha^4} \right)\right) \,, ] which is strictly worse for a large range of vanishing $\varepsilon,\alpha$. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity $\tilde{\Theta}(\max({\sqrt{d}}/{\alpha^2}, {d\varepsilon^2}/{\alpha^4} ))$, and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.
Distributed Sensing, Computing, Communication, and Control Fabric: A Unified Service-Level Architecture for 6G
Authors: Dejan Vukobratović, Nikolaos Bartzoudis, Mona Ghassemian, Firooz Saghezchi, Peizheng Li, Adnan Aijaz, Ricardo Martinez, Xueli An, Ranga Rao Venkatesha Prasad, Helge Lüders, Shahid Mumtaz
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.10286
Pdf link: https://arxiv.org/pdf/2307.10286
Abstract With the advent of the multimodal immersive communication system, people can interact with each other using multiple devices for sensing, communication and/or control either onsite or remotely. As a breakthrough concept, a distributed sensing, computing, communications, and control (DS3C) fabric is introduced in this paper for provisioning 6G services in multi-tenant environments in a unified manner. The DS3C fabric can be further enhanced by natively incorporating intelligent algorithms for network automation and managing networking, computing, and sensing resources efficiently to serve vertical use cases with extreme and/or conflicting requirements. As such, the paper proposes a novel end-to-end 6G system architecture with enhanced intelligence spanning across different network, computing, and business domains, identifies vertical use cases and presents an overview of the relevant standardization and pre-standardization landscape.
Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks
Authors: Siyu Li, Jin Yang, Kui Zhao
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2307.10337
Pdf link: https://arxiv.org/pdf/2307.10337
Abstract As the capabilities of Large Language Models (LLMs) emerge, they not only assist in accomplishing traditional tasks within more efficient paradigms but also stimulate the evolution of social bots. Researchers have begun exploring the implementation of LLMs as the driving core of social bots, enabling more efficient and user-friendly completion of tasks like profile completion, social behavior decision-making, and social content generation. However, there is currently a lack of systematic research on the behavioral characteristics of LLMs-driven social bots and their impact on social networks. We have curated data from Chirper, a Twitter-like social network populated by LLMs-driven social bots and embarked on an exploratory study. Our findings indicate that: (1) LLMs-driven social bots possess enhanced individual-level camouflage while exhibiting certain collective characteristics; (2) these bots have the ability to exert influence on online communities through toxic behaviors; (3) existing detection methods are applicable to the activity environment of LLMs-driven social bots but may be subject to certain limitations in effectiveness. Moreover, we have organized the data collected in our study into the Masquerade-23 dataset, which we have publicly released, thus addressing the data void in the subfield of LLMs-driven social bots behavior datasets. Our research outcomes provide primary insights for the research and governance of LLMs-driven social bots within the research community.
NFT-Based Blockchain-Oriented Security Framework for Metaverse Applications
Authors: Khadija Manzoor, Umara Noor, Zahid Rashid
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10342
Pdf link: https://arxiv.org/pdf/2307.10342
Abstract The Metaverse is rapidly evolving, bringing us closer to its imminent reality. However, the widespread adoption of this new automated technology poses significant research challenges in terms of authenticity, integrity, interoperability, and efficiency. These challenges originate from the core technologies underlying the Metaverse and are exacerbated by its complex nature. As a solution to these challenges, this paper presents a novel framework based on Non-Fungible Tokens (NFTs). The framework employs the Proof-of-Stake consensus algorithm, a blockchain-based technology, for data transaction, validation, and resource management. PoS efficiently consume energy and provide a streamlined validation approach instead of resource-intensive mining. This ability makes PoS an ideal candidate for Metaverse applications. By combining NFTs for user authentication and PoS for data integrity, enhanced transaction throughput, and improved scalability, the proposed blockchain mechanism demonstrates noteworthy advantages. Through security analysis, experimental and simulation results, it is established that the NFT-based approach coupled with the PoS algorithm is secure and efficient for Metaverse applications.
Classification of Visualization Types and Perspectives in Patents
Authors: Junaid Ahmed Ghauri, Eric Müller-Budack, Ralph Ewerth
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10471
Pdf link: https://arxiv.org/pdf/2307.10471
Abstract Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?
Authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10472
Pdf link: https://arxiv.org/pdf/2307.10472
Abstract As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
Blockchain-Based Federated Learning: Incentivizing Data Sharing and Penalizing Dishonest Behavior
Authors: Amir Jaberzadeh, Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Bhargav Dave, Jason Geng
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10492
Pdf link: https://arxiv.org/pdf/2307.10492
Abstract With the increasing importance of data sharing for collaboration and innovation, it is becoming more important to ensure that data is managed and shared in a secure and trustworthy manner. Data governance is a common approach to managing data, but it faces many challenges such as data silos, data consistency, privacy, security, and access control. To address these challenges, this paper proposes a comprehensive framework that integrates data trust in federated learning with InterPlanetary File System, blockchain, and smart contracts to facilitate secure and mutually beneficial data sharing while providing incentives, access control mechanisms, and penalizing any dishonest behavior. The experimental results demonstrate that the proposed model is effective in improving the accuracy of federated learning models while ensuring the security and fairness of the data-sharing process. The research paper also presents a decentralized federated learning platform that successfully trained a CNN model on the MNIST dataset using blockchain technology. The platform enables multiple workers to train the model simultaneously while maintaining data privacy and security. The decentralized architecture and use of blockchain technology allow for efficient communication and coordination between workers. This platform has the potential to facilitate decentralized machine learning and support privacy-preserving collaboration in various domains.
An Analysis of Bugs In Persistent Memory Application
Authors: Jahid Hasan
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2307.10493
Pdf link: https://arxiv.org/pdf/2307.10493
Abstract Over the years of challenges on detecting the crash consistency of non-volatile persistent memory (PM) bugs and developing new tools to identify those bugs are quite stretching due to its inconsistent behavior on the file or storage systems. In this paper, we evaluated an open-sourced automatic bug detector tool (i.e. AGAMOTTO) to test NVM level hashing PM application to identify performance and correctness PM bugs in the persistent (main) memory. Furthermore, our faithful validation tool able to discovered 65 new NVM level hashing bugs on PMDK library and it outperformed the number of bugs (i.e. 40 bugs) that WITCHER framework was able to identified. Finally, we will propose a Deep-Q Learning search heuristic algorithm over the PM-Aware search algorithm in the state selection process to improve the searching strategy efficiently.
Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets
Authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.10495
Pdf link: https://arxiv.org/pdf/2307.10495
Abstract Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data
Authors: Praveen Venkatesh, Corbett Bennett, Sam Gale, Tamina K. Ramirez, Greggory Heller, Severine Durand, Shawn Olsen, Stefan Mihalas
Subjects: Information Theory (cs.IT); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2307.10515
Pdf link: https://arxiv.org/pdf/2307.10515
Abstract Recent advances in neuroscientific experimental techniques have enabled us to simultaneously record the activity of thousands of neurons across multiple brain regions. This has led to a growing need for computational tools capable of analyzing how task-relevant information is represented and communicated between several brain regions. Partial information decompositions (PIDs) have emerged as one such tool, quantifying how much unique, redundant and synergistic information two or more brain regions carry about a task-relevant message. However, computing PIDs is computationally challenging in practice, and statistical issues such as the bias and variance of estimates remain largely unexplored. In this paper, we propose a new method for efficiently computing and estimating a PID definition on multivariate Gaussian distributions. We show empirically that our method satisfies an intuitive additivity property, and recovers the ground truth in a battery of canonical examples, even at high dimensionality. We also propose and evaluate, for the first time, a method to correct the bias in PID estimates at finite sample sizes. Finally, we demonstrate that our Gaussian PID effectively characterizes inter-areal interactions in the mouse brain, revealing higher redundancy between visual areas when a stimulus is behaviorally relevant.
Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion
Authors: Johan S. Obando-Ceron, Victor Romero-Cano, Sildomar Monteiro
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10519
Pdf link: https://arxiv.org/pdf/2307.10519
Abstract Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers from some inherent drawbacks, such as high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate for the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They process the sensor data streams independently and combine the high-level estimates obtained for each sensor. In this paper, we tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field model with multiple geometry and appearance potentials. It seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset.
Fast Unsupervised Deep Outlier Model Selection with Hypernetworks
Authors: Xueying Ding, Yue Zhao, Leman Akoglu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10529
Pdf link: https://arxiv.org/pdf/2307.10529
Abstract Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
Authors: Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10551
Pdf link: https://arxiv.org/pdf/2307.10551
Abstract Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.
Efficient algorithms for enumerating maximal common subsequences of two strings
Authors: Miyuji Hirota, Yoshifumi Sakai
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.10552
Pdf link: https://arxiv.org/pdf/2307.10552
Abstract We propose efficient algorithms for enumerating maximal common subsequences (MCSs) of two strings. Efficiency of the algorithms are estimated by the preprocessing-time, space, and delay-time complexities. One algorithm prepares a cubic-space data structure in cubic time to output each MCS in linear time. This data structure can be used to search for particular MCSs satisfying some condition without performing an explicit enumeration. Another prepares a quadratic-space data structure in quadratic time to output each MCS in linear time, and the other prepares a linear-space data structure in quadratic time to output each MCS in linearithmic time.
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10554
Pdf link: https://arxiv.org/pdf/2307.10554
Abstract Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.
Lightweight Neural Path Planning
Authors: Jinsong Li, Shaochen Wang, Ziyang Chen, Zhen Kan, Jun Yu
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.10555
Pdf link: https://arxiv.org/pdf/2307.10555
Abstract Learning-based path planning is becoming a promising robot navigation methodology due to its adaptability to various environments. However, the expensive computing and storage associated with networks impose significant challenges for their deployment on low-cost robots. Motivated by this practical challenge, we develop a lightweight neural path planning architecture with a dual input network and a hybrid sampler for resource-constrained robotic systems. Our architecture is designed with efficient task feature extraction and fusion modules to translate the given planning instance into a guidance map. The hybrid sampler is then applied to restrict the planning within the prospective regions indicated by the guide map. To enable the network training, we further construct a publicly available dataset with various successful planning instances. Numerical simulations and physical experiments demonstrate that, compared with baseline approaches, our approach has nearly an order of magnitude fewer model size and five times lower computational while achieving promising performance. Besides, our approach can also accelerate the planning convergence process with fewer planning iterations compared to sample-based methods.
Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning
Authors: Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10574
Pdf link: https://arxiv.org/pdf/2307.10574
Abstract Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.
Ethosight: A Joint-Embedding Based System for Nuanced Perception Using Contextual Label Affinity Metric and Reasoning Based Iterative Learning
Authors: Hugo Latapie, Kristinn R. Thorisson, Shan Yu, Vahagn Petrosyan, Patrick Hammer, Pei Wang, Brandon Kynoch, Hanning Chen, Tangrui Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10577
Pdf link: https://arxiv.org/pdf/2307.10577
Abstract Traditional computer vision models often require extensive manual effort for data acquisition and validation, particularly when detecting subtle behavioral nuances or events. The difficulty in distinguishing routine behaviors from potential risks in real-world applications, like differentiating routine shopping from potential shoplifting, further complicates the process. We present Ethosight, a novel zero-shot computer vision algorithm. Ethosight eradicates the need for pre-existing symbolic knowledge, initiating from a clean slate based on user requirements and semantic knowledge of interest. Using localized label affinity calculations and a reasoning-guided iterative learning loop, Ethosight infers scene details and iteratively refines the label set. Reasoning mechanisms can be derived from large language models like GPT4, symbolic reasoners like OpenNARS, or hybrid systems. Ethosight further capitalizes on the capabilities of a pre-trained multi-modal model, ImageBind, generating accurate semantic knowledge of images within a few cycles. It successfully captures both explicit and nuanced elements efficiently. We also introduce the implementation of Korzybski's "time-binding" concept in machines, which allows for generational learning and knowledge sharing across deployments. Our evaluations demonstrate Ethosight's efficacy across 40 complex use cases. It has exhibited an exceptional ability to discern new areas of interest, consistently generating high-affinity scores within the top five labels from a set of a thousand. Tests conducted across diverse environments attest to Ethosight's robust performance. Detailed results and case studies within the main body of this paper and an appendix underscore a promising trajectory towards enhancing the adaptability and resilience of computer vision models in detecting and extracting subtle and nuanced behaviors.
Boundary State Generation for Testing and Improvement of Autonomous Driving Systems
Authors: Matteo Biagiola, Paolo Tonella
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10590
Pdf link: https://arxiv.org/pdf/2307.10590
Abstract Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model.
Individualization of atrial tachycardia models for clinical applications: Performance of fiber-independent model
Authors: Jiyue He, Arkady Pertsov, John Bullinga, Rahul Mangharam
Subjects: Medical Physics (physics.med-ph)
Arxiv link: https://arxiv.org/abs/2307.10592
Pdf link: https://arxiv.org/pdf/2307.10592
Abstract One of the challenges in the development of patient-specific models of cardiac arrhythmias for clinical applications has been accounting for myocardial fiber organization. The fiber varies significantly from heart to heart, but cannot be directly measured in live tissue. The goal of this paper is to evaluate in-silico the accuracy of left atrium activation maps produced by a fiber-independent (isotropic) model with tuned diffusion coefficients, compares to a model incorporating myocardial fibers with the same geometry. For this study we utilize publicly available DT-MRI data from 7 ex-vivo hearts. The comparison is carried out in 51 cases of focal and rotor arrhythmias located in different regions of the atria. On average, the local activation time accuracy is 96% for focal and 93% for rotor arrhythmias. Given its reasonably good performance and the availability of readily accessible data for model tuning in cardiac ablation procedures, the fiber-independent model could be a promising tool for clinical applications.
Model order reduction with novel discrete empirical interpolation methods in space-time
Authors: Nicholas Mueller, Santiago Badia
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.10605
Pdf link: https://arxiv.org/pdf/2307.10605
Abstract This work proposes novel techniques for the efficient numerical simulation of parameterized, unsteady partial differential equations. Projection-based reduced order models (ROMs) such as the reduced basis method employ a (Petrov-)Galerkin projection onto a linear low-dimensional subspace. In unsteady applications, space-time reduced basis (ST-RB) methods have been developed to achieve a dimension reduction both in space and time, eliminating the computational burden of time marching schemes. However, nonaffine parameterizations dilute any computational speedup achievable by traditional ROMs. Computational efficiency can be recovered by linearizing the nonaffine operators via hyper-reduction, such as the empirical interpolation method in matrix form. In this work, we implement new hyper-reduction techniques explicitly tailored to deal with unsteady problems and embed them in a ST-RB framework. For each of the proposed methods, we develop a posteriori error bounds. We run numerical tests to compare the performance of the proposed ROMs against high-fidelity simulations, in which we combine the finite element method for space discretization on 3D geometries and the Backward Euler time integrator. In particular, we consider a heat equation and an unsteady Stokes equation. The numerical experiments demonstrate the accuracy and computational efficiency our methods retain with respect to the high-fidelity simulations.
Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck
Authors: Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10631
Pdf link: https://arxiv.org/pdf/2307.10631
Abstract The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.
Exploring the Landscape of Natural Language Processing Research
Authors: Tim Schopf, Karim Arabi, Florian Matthes
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.10652
Pdf link: https://arxiv.org/pdf/2307.10652
Abstract As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing amount of research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent to this day. Contributing to closing this gap, we have systematically classified and analyzed research papers included in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields-of-study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
Conditional expectation network for SHAP
Authors: Ronald Richman, Mario V. Wüthrich
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.10654
Pdf link: https://arxiv.org/pdf/2307.10654
Abstract A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.
A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency
Authors: Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10655
Pdf link: https://arxiv.org/pdf/2307.10655
Abstract Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum
Authors: Daniel Rosendo (ZENITH, KerData), Marta Mattoso (COPPE-UFRJ), Alexandru Costan (INSA Rennes, IRISA), Renan Souza (ORNL), Débora Pina (COPPE-UFRJ), Patrick Valduriez (ZENITH), Gabriel Antoniu (PARIS)
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2307.10658
Pdf link: https://arxiv.org/pdf/2307.10658
Abstract Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26 -- 37x faster to capture and transmit provenance data; uses 5 -- 7x less CPU; 2x less memory; transmits 2x less data; and consumes 2 -- 2.5x less energy. ProvLight and E2Clab are available as open-source tools.
A second order directional split exponential integrator for systems of advection--diffusion--reaction equations
Authors: Marco Caliari, Fabio Cassini
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.10684
Pdf link: https://arxiv.org/pdf/2307.10684
Abstract We propose a second order exponential scheme suitable for two-component coupled systems of stiff advection--diffusion--reaction equations in two and three space dimensions. It is based on a directional splitting of the involved matrix functions, which allows for a simple yet efficient implementation through the computation of small-sized exponential-like functions and tensor-matrix products. The procedure straightforwardly extends to the case of an arbitrary number of components and to any space dimension $d$. Several numerical experiments in 2D and 3D with physically relevant DIB, Schnakenberg, FitzHugh--Nagumo, and advective Brusselator models clearly show the advantage of the approach against state-of-the-art techniques.
A Constraint-based Recommender System via RDF Knowledge Graphs
Authors: Ngoc Luyen Le (Heudiasyc), Marie-Hélène Abel (Heudiasyc), Philippe Gouspillou
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2307.10702
Pdf link: https://arxiv.org/pdf/2307.10702
Abstract Knowledge graphs, represented in RDF, are able to model entities and their relations by means of ontologies. The use of knowledge graphs for information modeling has attracted interest in recent years. In recommender systems, items and users can be mapped and integrated into the knowledge graph, which can represent more links and relationships between users and items. Constraint-based recommender systems are based on the idea of explicitly exploiting deep recommendation knowledge through constraints to identify relevant recommendations. When combined with knowledge graphs, a constraint-based recommender system gains several benefits in terms of constraint sets. In this paper, we investigate and propose the construction of a constraint-based recommender system via RDF knowledge graphs applied to the vehicle purchase/sale domain. The results of our experiments show that the proposed approach is able to efficiently identify recommendations in accordance with user preferences.
TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars
Authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10705
Pdf link: https://arxiv.org/pdf/2307.10705
Abstract Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.
Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV
Authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10713
Pdf link: https://arxiv.org/pdf/2307.10713
Abstract Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.
Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO
Authors: Cheng Zhang, Pengguang Du, Minjie Ding, Yindi Jing, Yongming Huang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.10730
Pdf link: https://arxiv.org/pdf/2307.10730
Abstract In frequency division duplexing (FDD) cell-free massive MIMO, the acquisition of the channel state information (CSI) is very challenging because of the large overhead required for the training and feedback of the downlink channels of multiple cooperating base stations (BSs). In this paper, for systems with partial uplink-downlink channel reciprocity, and a general spatial domain channel model with variations in the average port power and correlation among port coefficients, we propose a joint-port-selection-based CSI acquisition and feedback scheme for the downlink transmission with zero-forcing precoding. The scheme uses an eigenvalue-decomposition-based transformation to reduce the feedback overhead by exploring the port correlation. We derive the sum-rate of the system for any port selection. Based on the sum-rate result, we propose a low-complexity greedy-search-based joint port selection (GS-JPS) algorithm. Moreover, to adapt to fast time-varying scenarios, a supervised deep learning-enhanced joint port selection (DL-JPS) algorithm is proposed. Simulations verify the effectiveness of our proposed schemes and their advantage over existing port-selection channel acquisition schemes.
TransNFV: Integrating Transactional Semantics for Efficient State Management in Virtual Network Functions
Authors: Zhonghao Yang, Shuhao Zhang, Binbin Chen
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2307.10732
Pdf link: https://arxiv.org/pdf/2307.10732
Abstract Managing shared mutable states in high concurrency state access operations is a persistent challenge in Network Functions Virtualization (NFV). This is particularly true when striving to meet chain output equivalence (COE) requirements. This paper presents TransNFV, an innovative NFV framework that incorporates transactional semantics to optimize NFV state management. The TransNFV integrates VNF state access operations as transactions, resolves transaction dependencies, schedules transactions dynamically, and executes transactions efficiently. Initial findings suggest that TransNFV maintains shared VNF state consistency, meets COE requirements, and skillfully handles complex cross-flow states in dynamic network conditions. TransNFV thus provides a promising solution to enhance state management and overall performance in future NFV platforms.
Urban Radiance Field Representation with Deformable Neural Mesh Primitives
Authors: Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10776
Pdf link: https://arxiv.org/pdf/2307.10776
Abstract Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.
Communication-Efficient Split Learning via Adaptive Feature-Wise Compression
Authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10805
Pdf link: https://arxiv.org/pdf/2307.10805
Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
Sensing User's Activity, Channel, and Location with Near-Field Extra-Large-Scale MIMO
Authors: Li Qiao, Anwen Liao, Zhuoran Li, Hua Wang, Zhen Gao, Xiang Gao, Yu Su, Pei Xiao, Li You, Derrick Wing Kwan Ng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.10837
Pdf link: https://arxiv.org/pdf/2307.10837
Abstract This paper proposes a grant-free massive access scheme based on the millimeter wave (mmWave) extra-large-scale multiple-input multiple-output (XL-MIMO) to support massive Internet-of-Things (IoT) devices with low latency, high data rate, and high localization accuracy in the upcoming sixth-generation (6G) networks. The XL-MIMO consists of multiple antenna subarrays that are widely spaced over the service area to ensure line-of-sight (LoS) transmissions. First, we establish the XL-MIMO-based massive access model considering the near-field spatial non-stationary (SNS) property. Then, by exploiting the block sparsity of subarrays and the SNS property, we propose a structured block orthogonal matching pursuit algorithm for efficient active user detection (AUD) and channel estimation (CE). Furthermore, different sensing matrices are applied in different pilot subcarriers for exploiting the diversity gains. Additionally, a multi-subarray collaborative localization algorithm is designed for localization. In particular, the angle of arrival (AoA) and time difference of arrival (TDoA) of the LoS links between active users and related subarrays are extracted from the estimated XL-MIMO channels, and then the coordinates of active users are acquired by jointly utilizing the AoAs and TDoAs. Simulation results show that the proposed algorithms outperform existing algorithms in terms of AUD and CE performance and can achieve centimeter-level localization accuracy.
Shortest Dominating Set Reconfiguration under Token Sliding
Authors: Jan Matyáš Křišťan, Jakub Svoboda
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2307.10847
Pdf link: https://arxiv.org/pdf/2307.10847
Abstract In this paper, we present novel algorithms that efficiently compute a shortest reconfiguration sequence between two given dominating sets in trees and interval graphs under the Token Sliding model. In this problem, a graph is provided along with its two dominating sets, which can be imagined as tokens placed on vertices. The objective is to find a shortest sequence of dominating sets that transforms one set into the other, with each set in the sequence resulting from sliding a single token in the previous set. While identifying any sequence has been well studied, our work presents the first polynomial algorithms for this optimization variant in the context of dominating sets.
Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios
Authors: Ken Mori, Kai Storms, Steven Peters
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10873
Pdf link: https://arxiv.org/pdf/2307.10873
Abstract Having efficient testing strategies is a core challenge that needs to be overcome for the release of automated driving. This necessitates clear requirements as well as suitable methods for testing. In this work, the requirements for perception modules are considered with respect to relevance. The concept of relevance currently remains insufficiently defined and specified. In this paper, we propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain. Using this general system and use case specification, a corresponding concept for relevance is derived. Irrelevant objects are thus defined as objects which do not limit the set of safe actions available to the ego vehicle under consideration of all uncertainties. As an initial step, the use case is decomposed into functional scenarios with respect to collision relevance. For each functional scenario, possible actions of both the ego vehicle and any other dynamic object are formalized as equations. This set of possible actions is constrained by traffic rules, yielding relevance criteria. As a result, we present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation. The estimation provides requirements which are applicable for offline testing and validation of perception components. A visualization is presented for examples from the highD dataset, showing the plausibility of the results. Finally, a possibility for a future validation of the presented relevance concept is outlined.
A Circular Restricted n-body Problem
Authors: Rodolfo Batista Negri, Antonio Fernando Bertachini de Almeida Prado
Subjects: Systems and Control (eess.SY); Earth and Planetary Astrophysics (astro-ph.EP)
Arxiv link: https://arxiv.org/abs/2307.10881
Pdf link: https://arxiv.org/pdf/2307.10881
Abstract This paper introduces the Circular Restricted n-Body Problem (CRNBP), an extension of the bicircular restricted four-body problem (BCR4BP) designed to describe the dynamics of an n-body system. In the CRNBP, each massive body in the system is constrained to follow a Keplerian motion, similar to the BCR4BP's artificial constraint. The CRNBP is an efficient alternative for trajectory design in multiple-body systems, particularly for outer planetary systems, as it requires integrating only six first-order ordinary differential equations compared to the 6N equations in an ephemerides model. By reproducing complex dynamical behaviors observed in ephemerides n-body problems, we demonstrate the structural stability of the CRNBP. Additionally, we propose a straightforward approach to relate the CRNBP with ephemerides, enabling the exploration of trajectory design possibilities before committing to a dedicated ephemerides analysis. This allows for the identification of general dynamical behaviors and provides valuable insights into the dynamics of multiple body systems. Finally, illustrative examples highlight the richness of trajectories and potential advantages of using the CRNBP for designing complex trajectories in outer planetary systems. The CRNBP proves to be a valuable tool for preliminary trajectory design, facilitating the identification of low-energy trajectories and providing a foundation for further exploration in future dedicated studies.
Software Product Line Engineering via Software Transplantation
Authors: Leandro O. Souza, Earl T. Barr, Justyna Petke, Eduardo S. Almeida, Paulo Anselmo M. S. Neto
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.10896
Pdf link: https://arxiv.org/pdf/2307.10896
Abstract For companies producing related products, a Software Product Line (SPL) is a software reuse method that improves time-to-market and software quality, achieving substantial cost reductions.These benefits do not come for free. It often takes years to re-architect and re-engineer a codebase to support SPL and, once adopted, it must be maintained. Current SPL practice relies on a collection of tools, tailored for different reengineering phases, whose output developers must coordinate and integrate. We present Foundry, a general automated approach for leveraging software transplantation to speed conversion to and maintenance of SPL. Foundry facilitates feature extraction and migration. It can efficiently, repeatedly, transplant a sequence of features, implemented in multiple files. We used Foundry to create two valid product lines that integrate features from three real-world systems in an automated way. Moreover, we conducted an experiment comparing Foundry's feature migration with manual effort. We show that Foundry automatically migrated features across codebases 4.8 times faster, on average, than the average time a group of SPL experts took to accomplish the task.
The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
Authors: Borja Rodríguez-Gálvez, Arno Blaas, Pau Rodríguez, Adam Goliński, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10907
Pdf link: https://arxiv.org/pdf/2307.10907
Abstract The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.
MediaGPT : A Large Language Model Target Chinese Media
Authors: Zhonghao Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10930
Pdf link: https://arxiv.org/pdf/2307.10930
Abstract The development of large language models (LLMs) has seen rapid progress in recent years. One of the most widely used LLMs is the Generative Pre-trained Transformer (GPT) series, which has been applied in various fields, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. As a result, there is a growing need to develop LLM that are specifically tailored to the unique requirements of the media domain. In this paper, we present MediaGPT, a large language model training on variety of media data and addressing the practical needs of Chinese media. We have designed a diverse set of task instruction types to cater to the specific requirements of the domain. To further validate the effectiveness of our proposed LLM, we have constructed unique datasets that are tailored to the media domain and have also developed verification methods that are specifically designed for generative-type tasks. By doing so, we aim to bridge the gap between the general-purpose LLM and the requirements of the media domain, and to pave the way for more effective and efficient use of LLM in this field. This paper aims to explore the challenges and opportunities of developing LLM for media applications and to propose potential solutions for addressing these challenges.
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios
Authors: Aditya Nalgunda Ganesh, Dhruval Pobbathi Badrinath, Harshith Mohan Kumar, Priya SS, Surabhi Narayan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10934
Pdf link: https://arxiv.org/pdf/2307.10934
Abstract Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation.
PASTA: Pretrained Action-State Transformer Agents
Authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10936
Pdf link: https://arxiv.org/pdf/2307.10936
Abstract Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.
Aplicación de tecnologías IoT en el control y seguimiento de trasporte de carga terrestre
Authors: Omar Otoniel Flores-Cortez, Bruno Gonzales Crespin
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2307.10945
Pdf link: https://arxiv.org/pdf/2307.10945
Abstract Freight transport of goods and raw materials is a central part of the supply chain in the commercial exchange in Latin America. Control and monitoring of this activity are vital for an efficient economic flow and, more importantly, without losing money. Most of the problems that generate financial losses occur in cargo freight by land. Losses due to changes in the weight of the payload to be transported or fuel/time losses due to capricious changes by the driver on the scheduled route. This work aims to demonstrate use of Internet of Thing (IoT) techniques to propose a prototype of a telemetry system to monitor in real-time the payload weight and location of a cargo truck and become a technological tool that supports the tasks of monitoring and control of the use of cargo trucks, and together with other logistics measures, leads to minimizing economic losses. The development of this project was based on the IoT architecture reference model: an ATmega32u4 microcontroller was used together with a SIM808 GSM and GPS module as the main component of the IoT Node. In addition, Amazon Web Services (AWS) tools were used as an IoT web platform and cloud data storage. The main result was a prototype of a telemetry system to track a cargo truck via the web; the weight and position data are accessible from any device with internet access through a website. Preliminary field tests have shown the proposed system to be an efficient and low-cost option.
ESASCF: Expertise Extraction, Generalization and Reply Framework for an Optimized Automation of Network Security Compliance
Authors: Mohamed C. Ghanem, Thomas M. Chen, Mohamed A. Ferrag, Mohyi E. Kettouche
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.10967
Pdf link: https://arxiv.org/pdf/2307.10967
Abstract The Cyber threats exposure has created worldwide pressure on organizations to comply with cyber security standards and policies for protecting their digital assets. Vulnerability assessment (VA) and Penetration Testing (PT) are widely adopted Security Compliance (SC) methods to identify security gaps and anticipate security breaches. In the computer networks context and despite the use of autonomous tools and systems, security compliance remains highly repetitive and resources consuming. In this paper, we proposed a novel method to tackle the ever-growing problem of efficiency and effectiveness in network infrastructures security auditing by formally introducing, designing, and developing an Expert-System Automated Security Compliance Framework (ESASCF) that enables industrial and open-source VA and PT tools and systems to extract, process, store and re-use the expertise in a human-expert way to allow direct application in similar scenarios or during the periodic re-testing. The implemented model was then integrated within the ESASCF and tested on different size networks and proved efficient in terms of time-efficiency and testing effectiveness allowing ESASCF to take over autonomously the SC in Re-testing and offloading Expert by automating repeated segments SC and thus enabling Experts to prioritize important tasks in Ad-Hoc compliance tests. The obtained results validate the performance enhancement notably by cutting the time required for an expert to 50% in the context of typical corporate networks first SC and 20% in re-testing, representing a significant cost-cutting. In addition, the framework allows a long-term impact illustrated in the knowledge extraction, generalization, and re-utilization, which enables better SC confidence independent of the human expert skills, coverage, and wrong decisions resulting in impactful false negatives.
Deep Spiking-UNet for Image Processing
Authors: Hebei Li, Yueyi Zhang, Zhiwei Xiong, Zheng-jun Zha, Xiaoyan Sun
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.10974
Pdf link: https://arxiv.org/pdf/2307.10974
Abstract U-Net, known for its simple yet efficient architecture, is widely utilized for image processing tasks and is particularly suitable for deployment on neuromorphic chips. This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. To address the issue of information loss, we introduce multi-threshold spiking neurons, which improve the efficiency of information transmission within the Spiking-UNet. For the training strategy, we adopt a conversion and fine-tuning pipeline that leverage pre-trained U-Net models. During the conversion process, significant variability in data distribution across different parts is observed when utilizing skip connections. Therefore, we propose a connection-wise normalization method to prevent inaccurate firing rates. Furthermore, we adopt a flow-based training method to fine-tune the converted models, reducing time steps while preserving performance. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart, surpassing existing SNN methods. Compared with the converted Spiking-UNet without fine-tuning, our Spiking-UNet reduces inference time by approximately 90\%. This research broadens the application scope of SNNs in image processing and is expected to inspire further exploration in the field of neuromorphic engineering. The code for our Spiking-UNet implementation is available at https://github.com/SNNresearch/Spiking-UNet.
PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
Authors: Shiwei Ding, Lan Zhang, Miao Pan, Xiaoyong Yuan
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10981
Pdf link: https://arxiv.org/pdf/2307.10981
Abstract Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.
Unsupervised Learning in Complex Systems
Authors: Hugo Cisneros
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10993
Pdf link: https://arxiv.org/pdf/2307.10993
Abstract In this thesis, we explore the use of complex systems to study learning and adaptation in natural and artificial systems. The goal is to develop autonomous systems that can learn without supervision, develop on their own, and become increasingly complex over time. Complex systems are identified as a suitable framework for understanding these phenomena due to their ability to exhibit growth of complexity. Being able to build learning algorithms that require limited to no supervision would enable greater flexibility and adaptability in various applications. By understanding the fundamental principles of learning in complex systems, we hope to advance our ability to design and implement practical learning algorithms in the future. This thesis makes the following key contributions: the development of a general complexity metric that we apply to search for complex systems that exhibit growth of complexity, the introduction of a coarse-graining method to study computations in large-scale complex systems, and the development of a metric for learning efficiency as well as a benchmark dataset for evaluating the speed of learning algorithms. Our findings add substantially to our understanding of learning and adaptation in natural and artificial systems. Moreover, our approach contributes to a promising new direction for research in this area. We hope these findings will inspire the development of more effective and efficient learning algorithms in the future.
Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering
Authors: Yan Wen, Chen Gao, Lingling Yi, Liwei Qiu, Yaqing Wang, Yong Li
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11004
Pdf link: https://arxiv.org/pdf/2307.11004
Abstract Automated Machine Learning (AutoML) techniques have recently been introduced to design Collaborative Filtering (CF) models in a data-specific manner. However, existing works either search architectures or hyperparameters while ignoring the fact they are intrinsically related and should be considered together. This motivates us to consider a joint hyperparameter and architecture search method to design CF models. However, this is not easy because of the large search space and high evaluation cost. To solve these challenges, we reduce the space by screening out usefulness yperparameter choices through a comprehensive understanding of individual hyperparameters. Next, we propose a two-stage search algorithm to find proper configurations from the reduced space. In the first stage, we leverage knowledge from subsampled datasets to reduce evaluation costs; in the second stage, we efficiently fine-tune top candidate models on the whole dataset. Extensive experiments on real-world datasets show better performance can be achieved compared with both hand-designed and previous searched models. Besides, ablation and case studies demonstrate the effectiveness of our search framework.
Multi-objective point cloud autoencoders for explainable myocardial infarction prediction
Authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2307.11017
Pdf link: https://arxiv.org/pdf/2307.11017
Abstract Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.
Keyword: faster

SSD Forensic: Evidence Generation And Forensic Research On Solid State Drives Using Trim Analysis
Authors: Hassan Jalil Hadi, Irshad ullah, Sheetal Harris
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2307.10192
Pdf link: https://arxiv.org/pdf/2307.10192
Abstract Traditional hard drives consisting of spinning magnetic media platters are becoming things of the past as with the emergence of the latest digital technologies and electronic equipment, the demand for faster, lighter, and more reliable alternate storage solutions is imperative. To attain these requirements, flash storage technologies like Solid State Drive (SSD) has overtaken traditional hard disk drives. In a forensic analysis of flash storage devices, forensic investigators are facing severe challenges for the reason that the sovereign behavior of solid-state storage media does not look favorable compared to traditional storage media devices. Wear Leveling, a fundamental mechanism in Solid State Drive (SSD), plays a severe challenge that most often destroys forensic evidence in many cases. It makes it complicated for forensic investigators to recover the necessary evidence. Persistence of deleted data in flash storage media depends on various factors like the Garbage Collection process, TRIM command, flash media type, manufacturer, capacity, file system, type of file saved, and the Operating System, etc. In view of this, extensive experiments conducted to identify the probability of data recovery and carving. Analyzed effects of Wear Leveling and Garbage Collection processes in Solid State Drive (SSD) of different manufacturers, having the same storage capacities and with a different type of files utilized. In conclusion, experimental findings established the fact that Wear Leveling in solid-state media can obfuscate digital evidence, and a conventional assumption regarding the behavior of storage media is no more valid. Moreover, data persistency also depends on the manufacturers, time-lapse of forensic analysis after data deletion, type of files, and size of files stored in Solid State Drives (SSD).
IncDSI: Incrementally Updatable Document Retrieval
Authors: Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, Kilian Q. Weinberger
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10323
Pdf link: https://arxiv.org/pdf/2307.10323
Abstract Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.
Asymptotically minimal contractors based on the centered form;Application to the stability analysis of linear systems
Authors: Luc Jaulin
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.10502
Pdf link: https://arxiv.org/pdf/2307.10502
Abstract This paper proposes a new interval-based contractor for nonlinear equations which is minimal when dealing with narrow boxes. The method is based on the centered form classically used by interval algorithms combined with a Gauss Jordan band diagonalization preconditioning. As an illustration in stability analysis, we propose to compute the set of all parameters of a characteristic function of a linear dynamical system which have at least one zero in the imaginary axis. Our approach is able compute a guaranteed and accurate enclosure of the solution set faster than existing approaches.
Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter
Authors: Adam W. Hall, Melissa Greeff, Angela P. Schoellig
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10541
Pdf link: https://arxiv.org/pdf/2307.10541
Abstract Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.
PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
Authors: Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10551
Pdf link: https://arxiv.org/pdf/2307.10551
Abstract Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Authors: Qi Zhang, Sipeng Zheng, Qin Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10567
Pdf link: https://arxiv.org/pdf/2307.10567
Abstract Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.
Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck
Authors: Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10631
Pdf link: https://arxiv.org/pdf/2307.10631
Abstract The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.
ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum
Authors: Daniel Rosendo (ZENITH, KerData), Marta Mattoso (COPPE-UFRJ), Alexandru Costan (INSA Rennes, IRISA), Renan Souza (ORNL), Débora Pina (COPPE-UFRJ), Patrick Valduriez (ZENITH), Gabriel Antoniu (PARIS)
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2307.10658
Pdf link: https://arxiv.org/pdf/2307.10658
Abstract Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26 -- 37x faster to capture and transmit provenance data; uses 5 -- 7x less CPU; 2x less memory; transmits 2x less data; and consumes 2 -- 2.5x less energy. ProvLight and E2Clab are available as open-source tools.
Predicting human motion intention for pHRI assistive control
Authors: Paolo Franceschi, Fabio Bertini, Francesco Braghin, Loris Roveda, Nicola Pedrocchi, Manuel Beschi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10743
Pdf link: https://arxiv.org/pdf/2307.10743
Abstract This work addresses human intention identification during physical Human-Robot Interaction (pHRI) tasks to include this information in an assistive controller. To this purpose, human intention is defined as the desired trajectory that the human wants to follow over a finite rolling prediction horizon so that the robot can assist in pursuing it. This work investigates a Recurrent Neural Network (RNN), specifically, Long-Short Term Memory (LSTM) cascaded with a Fully Connected layer. In particular, we propose an iterative training procedure to adapt the model. Such an iterative procedure is powerful in reducing the prediction error. Still, it has the drawback that it is time-consuming and does not generalize to different users or different co-manipulated objects. To overcome this issue, Transfer Learning (TL) adapts the pre-trained model to new trajectories, users, and co-manipulated objects by freezing the LSTM layer and fine-tuning the last FC layer, which makes the procedure faster. Experiments show that the iterative procedure adapts the model and reduces prediction error. Experiments also show that TL adapts to different users and to the co-manipulation of a large object. Finally, to check the utility of adopting the proposed method, we compare the proposed controller enhanced by the intention prediction with the other two standard controllers of pHRI.
Urban Radiance Field Representation with Deformable Neural Mesh Primitives
Authors: Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10776
Pdf link: https://arxiv.org/pdf/2307.10776
Abstract Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.
Learned Thresholds Token Merging and Pruning for Vision Transformers
Authors: Maxim Bonnaerens, Joni Dambre
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10780
Pdf link: https://arxiv.org/pdf/2307.10780
Abstract Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .
Parallel Shooting Sequential Quadratic Programming for Nonlinear MPC Problems
Authors: P. C. N. Verheijen, M. Haghi, M. Lazar, D. Goswami
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.10868
Pdf link: https://arxiv.org/pdf/2307.10868
Abstract In this paper, we propose a parallel shooting algorithm for solving nonlinear model predictive control problems using sequential quadratic programming. This algorithm is built on a two-phase approach where we first test and assess sequential convergence over many initial trajectories in parallel. However, if none converge, the algorithm starts varying the Newton step size in parallel instead. Through this parallel shooting approach, it is expected that the number of iterations to converge to an optimal solution can be decreased. Furthermore, the algorithm can be further expanded and accelerated by implementing it on GPUs. We illustrate the effectiveness of the proposed Parallel Shooting Sequential Quadratic Programming (PS-SQP) method in some benchmark examples for nonlinear model predictive control. The developed PS-SQP parallel solver converges faster on average and especially when significant nonlinear behaviour is excited in the NMPC horizon.
Software Product Line Engineering via Software Transplantation
Authors: Leandro O. Souza, Earl T. Barr, Justyna Petke, Eduardo S. Almeida, Paulo Anselmo M. S. Neto
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2307.10896
Pdf link: https://arxiv.org/pdf/2307.10896
Abstract For companies producing related products, a Software Product Line (SPL) is a software reuse method that improves time-to-market and software quality, achieving substantial cost reductions.These benefits do not come for free. It often takes years to re-architect and re-engineer a codebase to support SPL and, once adopted, it must be maintained. Current SPL practice relies on a collection of tools, tailored for different reengineering phases, whose output developers must coordinate and integrate. We present Foundry, a general automated approach for leveraging software transplantation to speed conversion to and maintenance of SPL. Foundry facilitates feature extraction and migration. It can efficiently, repeatedly, transplant a sequence of features, implemented in multiple files. We used Foundry to create two valid product lines that integrate features from three real-world systems in an automated way. Moreover, we conducted an experiment comparing Foundry's feature migration with manual effort. We show that Foundry automatically migrated features across codebases 4.8 times faster, on average, than the average time a group of SPL experts took to accomplish the task.
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection
Authors: Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.11077
Pdf link: https://arxiv.org/pdf/2307.11077
Abstract The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.
Keyword: mobile

Contextual Beamforming: Exploiting Location and AI for Enhanced Wireless Telecommunication Performance
Authors: Jaspreet Kaur, Satyam Bhatti, Olaoluwa R Popoola, Muhammad Ali Imran, Rami Ghannam, Qammer H Abbasi, Hasan T Abbas
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.10183
Pdf link: https://arxiv.org/pdf/2307.10183
Abstract The pervasive nature of wireless telecommunication has made it the foundation for mainstream technologies like automation, smart vehicles, virtual reality, and unmanned aerial vehicles. As these technologies experience widespread adoption in our daily lives, ensuring the reliable performance of cellular networks in mobile scenarios has become a paramount challenge. Beamforming, an integral component of modern mobile networks, enables spatial selectivity and improves network quality. However, many beamforming techniques are iterative, introducing unwanted latency to the system. In recent times, there has been a growing interest in leveraging mobile users' location information to expedite beamforming processes. This paper explores the concept of contextual beamforming, discussing its advantages, disadvantages and implications. Notably, the study presents an impressive 53% improvement in signal-to-noise ratio (SNR) by implementing the adaptive beamforming (MRT) algorithm compared to scenarios without beamforming. It further elucidates how MRT contributes to contextual beamforming. The importance of localization in implementing contextual beamforming is also examined. Additionally, the paper delves into the use of artificial intelligence schemes, including machine learning and deep learning, in implementing contextual beamforming techniques that leverage user location information. Based on the comprehensive review, the results suggest that the combination of MRT and Zero forcing (ZF) techniques, alongside deep neural networks (DNN) employing Bayesian Optimization (BO), represents the most promising approach for contextual beamforming. Furthermore, the study discusses the future potential of programmable switches, such as Tofino, in enabling location-aware beamforming.
CAPTCHA Types and Breaking Techniques: Design Issues, Challenges, and Future Research Directions
Authors: N.Tariq, F.A.Khan, S.A.Moqurrab, G.Srivastava
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10239
Pdf link: https://arxiv.org/pdf/2307.10239
Abstract The proliferation of the Internet and mobile devices has resulted in malicious bots access to genuine resources and data. Bots may instigate phishing, unauthorized access, denial-of-service, and spoofing attacks to mention a few. Authentication and testing mechanisms to verify the end-users and prohibit malicious programs from infiltrating the services and data are strong defense systems against malicious bots. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is an authentication process to confirm that the user is a human hence, access is granted. This paper provides an in-depth survey on CAPTCHAs and focuses on two main things: (1) a detailed discussion on various CAPTCHA types along with their advantages, disadvantages, and design recommendations, and (2) an in-depth analysis of different CAPTCHA breaking techniques. The survey is based on over two hundred studies on the subject matter conducted since 2003 to date. The analysis reinforces the need to design more attack-resistant CAPTCHAs while keeping their usability intact. The paper also highlights the design challenges and open issues related to CAPTCHAs. Furthermore, it also provides useful recommendations for breaking CAPTCHAs.
Post-pandemic mobility patterns in London
Authors: Roberto Murcio, Nilufer Sari Aslam, Joana Barros
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2307.10344
Pdf link: https://arxiv.org/pdf/2307.10344
Abstract Understanding human mobility is crucial for urban and transport studies in cities. People's daily activities provide valuable insight, such as where people live, work, shop, leisure or eat during midday or after-work hours. However, such activities are changed due to travel behaviours after COVID-19 in cities. This study examines the mobility patterns captured from mobile phone apps to explore the behavioural patterns established since the COVID-19 lockdowns triggered a series of changes in urban environments.
Technology in Association With Mental Health: Meta-ethnography
Authors: Hamza Mohammed
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2307.10513
Pdf link: https://arxiv.org/pdf/2307.10513
Abstract This research paper presents a meta-analysis of the multifaceted role of technology in mental health. The pervasive influence of technology on daily lives necessitates a deep understanding of its impact on mental health services. This study synthesizes literature covering Behavioral Intervention Technologies (BITs), digital mental health interventions during COVID-19, young men's attitudes toward mental health technologies, technology-based interventions for university students, and the applicability of mobile health technologies for individuals with serious mental illnesses. BITs are recognized for their potential to provide evidence-based interventions for mental health conditions, especially anxiety disorders. The COVID-19 pandemic acted as a catalyst for the adoption of digital mental health services, underscoring their crucial role in providing accessible and quality care; however, their efficacy needs to be reinforced by workforce training, high-quality evidence, and digital equity. A nuanced understanding of young men's attitudes toward mental health is imperative for devising effective online services. Technology-based interventions for university students are promising, although variable in effectiveness; their deployment must be evidence-based and tailored to individual needs. Mobile health technologies, particularly activity tracking, hold promise for individuals with serious mental illnesses. Collectively, technology has immense potential to revolutionize mental health care. However, the implementation must be evidence-based, ethical, and equitable, with continued research focusing on experiences across diverse populations, ensuring accessibility and efficacy for all.
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10554
Pdf link: https://arxiv.org/pdf/2307.10554
Abstract Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.
Bridging Intelligence and Instinct: A New Control Paradigm for Autonomous Robots
Authors: Shimian Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10690
Pdf link: https://arxiv.org/pdf/2307.10690
Abstract As the advent of artificial general intelligence (AGI) progresses at a breathtaking pace, the application of large language models (LLMs) as AI Agents in robotics remains in its nascent stage. A significant concern that hampers the seamless integration of these AI Agents into robotics is the unpredictability of the content they generate, a phenomena known as ``hallucination''. Drawing inspiration from biological neural systems, we propose a novel, layered architecture for autonomous robotics, bridging AI agent intelligence and robot instinct. In this context, we define Robot Instinct as the innate or learned set of responses and priorities in an autonomous robotic system that ensures survival-essential tasks, such as safety assurance and obstacle avoidance, are carried out in a timely and effective manner. This paradigm harmoniously combines the intelligence of LLMs with the instinct of robotic behaviors, contributing to a more safe and versatile autonomous robotic system. As a case study, we illustrate this paradigm within the context of a mobile robot, demonstrating its potential to significantly enhance autonomous robotics and enabling a future where robots can operate independently and safely across diverse environments.
SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning
Authors: Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10697
Pdf link: https://arxiv.org/pdf/2307.10697
Abstract The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.
5G Non-Public Network for Industrial IoT: Operation Models
Authors: Ahmad Rostami, Dhruvin Patel, Madhusudan Giyyarpuram, Finn Pedersen
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2307.10781
Pdf link: https://arxiv.org/pdf/2307.10781
Abstract 5G non-public networks (NPNs) play a key role in enabling critical Industrial Internet of Things (IoT) applications in various vertical industries. Among other features, 5G NPNs enable novel operation models, where the roles and responsibilities for setting up and operating the network can be distributed among several stakeholders, i.e., among the public mobile network operators (MNOs), the industrial party who uses the 5G NPN services and 3rd parties. This results in many theoretically feasible operation models for 5G NPN, each with its own advantages and disadvantages. We investigate the resulting operation models and identify a set of nine prime models taking into account today's practical considerations. Additionally, we define a framework to qualitatively analyze the operation models and use it to evaluate and compare the identified operation models.
Control Input Inference of Mobile Agents under Unknown Objective
Authors: Chendi Qu, Jianping He, Xiaoming Duan, Shukun Wu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10883
Pdf link: https://arxiv.org/pdf/2307.10883
Abstract Trajectory and control secrecy is an important issue in robotics security. This paper proposes a novel algorithm for the control input inference of a mobile agent without knowing its control objective. Specifically, the algorithm first estimates the target state by applying external perturbations. Then we identify the objective function based on the inverse optimal control, providing the well-posedness proof and the identifiability analysis. Next, we obtain the optimal estimate of the control horizon using binary search. Finally, the agent's control optimization problem is reconstructed and solved to predict its input. Simulation illustrates the efficiency and the performance of the algorithm.
Keyword: pruning

SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning
Authors: Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10697
Pdf link: https://arxiv.org/pdf/2307.10697
Abstract The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.
Learned Thresholds Token Merging and Pruning for Vision Transformers
Authors: Maxim Bonnaerens, Joni Dambre
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10780
Pdf link: https://arxiv.org/pdf/2307.10780
Abstract Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .
PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks
Authors: Shiwei Ding, Lan Zhang, Miao Pan, Xiaoyong Yuan
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.10981
Pdf link: https://arxiv.org/pdf/2307.10981
Abstract Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.
Keyword: diffusion

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
Authors: Lejun Min, Junyan Jiang, Gus Xia, Jingwei Zhao
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2307.10304
Pdf link: https://arxiv.org/pdf/2307.10304
Abstract We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and external control. Internal control refers to the process in which users pre-define a part of the music and then let the model infill the rest, similar to the task of masked music generation (or music inpainting). External control conditions the model with external yet related information, such as chord, texture, or other features, via the cross-attention mechanism. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks, including melody generation given accompaniment, accompaniment generation given melody, arbitrary music segment inpainting, and music arrangement given chords or textures. Experimental results show that our model significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Authors: Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10373
Pdf link: https://arxiv.org/pdf/2307.10373
Abstract The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/
Generative Visual Question Answering
Authors: Ethan Shen, Scotty Singh, Bhavesh Kumar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10405
Pdf link: https://arxiv.org/pdf/2307.10405
Abstract Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack temporal generalization which enables models to adapt to changes in future data. This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization. We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion. This augmented dataset is then used to test a combination of seven baseline and cutting edge VQA models. Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images. This paper's purpose is to investigate the robustness of several successful VQA models to assess their performance on future data distributions. Model architectures are analyzed to identify common stylistic choices that improve generalization under temporal distribution shifts. This research highlights the importance of creating a large-scale future shifted dataset. This data can enhance the robustness of VQA models, allowing their future peers to have improved ability to adapt to temporal distribution shifts.
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Authors: Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10422
Pdf link: https://arxiv.org/pdf/2307.10422
Abstract Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap
Authors: Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10584
Pdf link: https://arxiv.org/pdf/2307.10584
Abstract Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, toinpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/
Individualization of atrial tachycardia models for clinical applications: Performance of fiber-independent model
Authors: Jiyue He, Arkady Pertsov, John Bullinga, Rahul Mangharam
Subjects: Medical Physics (physics.med-ph)
Arxiv link: https://arxiv.org/abs/2307.10592
Pdf link: https://arxiv.org/pdf/2307.10592
Abstract One of the challenges in the development of patient-specific models of cardiac arrhythmias for clinical applications has been accounting for myocardial fiber organization. The fiber varies significantly from heart to heart, but cannot be directly measured in live tissue. The goal of this paper is to evaluate in-silico the accuracy of left atrium activation maps produced by a fiber-independent (isotropic) model with tuned diffusion coefficients, compares to a model incorporating myocardial fibers with the same geometry. For this study we utilize publicly available DT-MRI data from 7 ex-vivo hearts. The comparison is carried out in 51 cases of focal and rotor arrhythmias located in different regions of the atria. On average, the local activation time accuracy is 96% for focal and 93% for rotor arrhythmias. Given its reasonably good performance and the availability of readily accessible data for model tuning in cardiac ablation procedures, the fiber-independent model could be a promising tool for clinical applications.
Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions
Authors: Frank Nielsen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2307.10644
Pdf link: https://arxiv.org/pdf/2307.10644
Abstract Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
A second order directional split exponential integrator for systems of advection--diffusion--reaction equations
Authors: Marco Caliari, Fabio Cassini
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.10684
Pdf link: https://arxiv.org/pdf/2307.10684
Abstract We propose a second order exponential scheme suitable for two-component coupled systems of stiff advection--diffusion--reaction equations in two and three space dimensions. It is based on a directional splitting of the involved matrix functions, which allows for a simple yet efficient implementation through the computation of small-sized exponential-like functions and tensor-matrix products. The procedure straightforwardly extends to the case of an arbitrary number of components and to any space dimension $d$. Several numerical experiments in 2D and 3D with physically relevant DIB, Schnakenberg, FitzHugh--Nagumo, and advective Brusselator models clearly show the advantage of the approach against state-of-the-art techniques.
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models
Authors: Jiachun Pan, Hanshu Yan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10711
Pdf link: https://arxiv.org/pdf/2307.10711
Abstract Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Authors: Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10816
Pdf link: https://arxiv.org/pdf/2307.10816
Abstract Recent text-to-image diffusion models have demonstrated an astonishing capacity to generate high-quality images. However, researchers mainly studied the way of synthesizing images with only text prompts. While some works have explored using other modalities as conditions, considerable paired data, e.g., box/mask-image pairs, and fine-tuning time are required for nurturing models. As such paired data is time-consuming and labor-intensive to acquire and restricted to a closed set, this potentially becomes the bottleneck for applications in an open world. This paper focuses on the simplest form of user-provided conditions, e.g., box or scribble. To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. Specifically, three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints, are designed and seamlessly integrated into the denoising step of diffusion models, requiring no additional training and massive annotated layout data. Extensive results show that the proposed constraints can control what and where to present in the images while retaining the ability of the Stable Diffusion model to synthesize with high fidelity and diverse concept coverage. The code is publicly available at https://github.com/Sierkinhane/BoxDiff.
Exact Diffusion Inversion via Bi-directional Integration Approximation
Authors: Guoqiang Zhang, J. P. Lewis, W. Bastiaan Kleijn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10829
Pdf link: https://arxiv.org/pdf/2307.10829
Abstract Recently, different methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT [36] and Null-text inversion [22]. However, the above methods introduce considerable computational overhead. In this paper, we propose a new technique, named bi-directional integration approximation (BDIA), to perform exact diffusion inversion with neglible computational overhead. Suppose we would like to estimate the next diffusion state $\boldsymbol{z}_{i-1}$ at timestep $t_i$ with the historical information $(i,\boldsymbol{z}i)$ and $(i+1,\boldsymbol{z}{i+1})$. We first obtain the estimated Gaussian noise $\hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i)$, and then apply the DDIM update procedure twice for approximating the ODE integration over the next time-slot $[ti, t{i-1}]$ in the forward manner and the previous time-slot $[ti, t{t+1}]$ in the backward manner. The DDIM step for the previous time-slot is used to refine the integration approximation made earlier when computing $\boldsymbol{z}i$. One nice property with BDIA-DDIM is that the update expression for $\boldsymbol{z}{i-1}$ is a linear combination of $(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}}(\boldsymbol{z}i,i))$. This allows for exact backward computation of $\boldsymbol{z}{i+1}$ given $(\boldsymbol{z}i, \boldsymbol{z}{i-1})$, thus leading to exact diffusion inversion. Experiments on both image reconstruction and image editing were conducted, confirming our statement. BDIA can also be applied to improve the performance of other ODE solvers in addition to DDIM. In our work, it is found that applying BDIA to the EDM sampling procedure produces slightly better FID score over CIFAR10.
Divide & Bind Your Attention for Improved Generative Semantic Nursing
Authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10864
Pdf link: https://arxiv.org/pdf/2307.10864
Abstract Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
Structure-preserving schemes for drift-diffusion systems on general meshes: DDFV vs HFV
Authors: Stella Krell, Julien Moatti
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2307.10911
Pdf link: https://arxiv.org/pdf/2307.10911
Abstract We made a comparison between a Discrete Duality Finite Volume (DDFV) scheme and a Hybrid Finite Volume (HFV) scheme for a drift-diffusion model with mixed boundary conditions on general meshes. Both schemes are based on a nonlinear discretisation of the convection-diffusion fluxes, which ensures the positivity of the discrete densities. We investigate the behaviours of the schemes on various numerical test cases.
Progressive distillation diffusion for raw music generation
Authors: Svetlana Pavlova
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2307.10994
Pdf link: https://arxiv.org/pdf/2307.10994
Abstract This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x 384 to 3-channel 128 x 128 mel-spectrograms and looped generation. The empirical comparisons are realized across different self-collected datasets.
Hypergraph Diffusions and Resolvents for Norm-Based Hypergraph Laplacians
Authors: Konstantinos Ameranis, Antares Chen, Adela DePavia, Lorenzo Orecchia, Erasmo Tani
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2307.11042
Pdf link: https://arxiv.org/pdf/2307.11042
Abstract The development of simple and fast hypergraph spectral methods has been hindered by the lack of numerical algorithms for simulating heat diffusions and computing fundamental objects, such as Personalized PageRank vectors, over hypergraphs. In this paper, we overcome this challenge by designing two novel algorithmic primitives. The first is a simple, easy-to-compute discrete-time heat diffusion that enjoys the same favorable properties as the discrete-time heat diffusion over graphs. This diffusion can be directly applied to speed up existing hypergraph partitioning algorithms. Our second contribution is the novel application of mirror descent to compute resolvents of non-differentiable squared norms, which we believe to be of independent interest beyond hypergraph problems. Based on this new primitive, we derive the first nearly-linear-time algorithm that simulates the discrete-time heat diffusion to approximately compute resolvents of the hypergraph Laplacian operator, which include Personalized PageRank vectors and solutions to the hypergraph analogue of Laplacian systems. Our algorithm runs in time that is linear in the size of the hypergraph and inversely proportional to the hypergraph spectral gap $\lambda_G$, matching the complexity of analogous diffusion-based algorithms for the graph version of the problem.
3D-IDS: Doubly Disentangled Dynamic Intrusion Detection
Authors: Chenyang Qiu, Yingsheng Geng, Junrui Lu, Kaida Chen, Shitong Zhu, Ya Su, Guoshun Nan, Can Zhang, Junsong Fu, Qimei Cui, Xiaofeng Tao
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2307.11079
Pdf link: https://arxiv.org/pdf/2307.11079
Abstract Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.
Keyword: adaptive

Contextual Beamforming: Exploiting Location and AI for Enhanced Wireless Telecommunication Performance
Authors: Jaspreet Kaur, Satyam Bhatti, Olaoluwa R Popoola, Muhammad Ali Imran, Rami Ghannam, Qammer H Abbasi, Hasan T Abbas
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2307.10183
Pdf link: https://arxiv.org/pdf/2307.10183
Abstract The pervasive nature of wireless telecommunication has made it the foundation for mainstream technologies like automation, smart vehicles, virtual reality, and unmanned aerial vehicles. As these technologies experience widespread adoption in our daily lives, ensuring the reliable performance of cellular networks in mobile scenarios has become a paramount challenge. Beamforming, an integral component of modern mobile networks, enables spatial selectivity and improves network quality. However, many beamforming techniques are iterative, introducing unwanted latency to the system. In recent times, there has been a growing interest in leveraging mobile users' location information to expedite beamforming processes. This paper explores the concept of contextual beamforming, discussing its advantages, disadvantages and implications. Notably, the study presents an impressive 53% improvement in signal-to-noise ratio (SNR) by implementing the adaptive beamforming (MRT) algorithm compared to scenarios without beamforming. It further elucidates how MRT contributes to contextual beamforming. The importance of localization in implementing contextual beamforming is also examined. Additionally, the paper delves into the use of artificial intelligence schemes, including machine learning and deep learning, in implementing contextual beamforming techniques that leverage user location information. Based on the comprehensive review, the results suggest that the combination of MRT and Zero forcing (ZF) techniques, alongside deep neural networks (DNN) employing Bayesian Optimization (BO), represents the most promising approach for contextual beamforming. Furthermore, the study discusses the future potential of programmable switches, such as Tofino, in enabling location-aware beamforming.
RayMVSNet++: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo
Authors: Yifei Shi, Junhua Xi, Dewen Hu, Zhiping Cai, Kai Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10233
Pdf link: https://arxiv.org/pdf/2307.10233
Abstract Learning-based multi-view stereo (MVS) has by far centered around 3D convolution on cost volumes. Due to the high computation and memory consumption of 3D CNN, the resolution of output depth is often considerably limited. Different from most existing works dedicated to adaptive refinement of cost volumes, we opt to directly optimize the depth value along each camera ray, mimicking the range finding of a laser scanner. This reduces the MVS problem to ray-based depth optimization which is much more light-weight than full cost volume optimization. In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. This sequential modeling, conducted based on transformer features, essentially learns the epipolar line search in traditional multi-view stereo. We devise a multi-task learning for better optimization convergence and depth accuracy. We found the monotonicity property of the SDFs along each ray greatly benefits the depth estimation. Our method ranks top on both the DTU and the Tanks & Temples datasets over all previous learning-based methods, achieving an overall reconstruction score of 0.33mm on DTU and an F-score of 59.48% on Tanks & Temples. It is able to produce high-quality depth estimation and point cloud reconstruction in challenging scenarios such as objects/scenes with non-textured surface, severe occlusion, and highly varying depth range. Further, we propose RayMVSNet++ to enhance contextual feature aggregation for each ray through designing an attentional gating unit to select semantically relevant neighboring rays within the local frustum around that ray. RayMVSNet++ achieves state-of-the-art performance on the ScanNet dataset. In particular, it attains an AbsRel of 0.058m and produces accurate results on the two subsets of textureless regions and large depth variation.
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection
Authors: Jisong Kim, Minjae Seong, Geonho Bang, Dongsuk Kum, Jun Won Choi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10249
Pdf link: https://arxiv.org/pdf/2307.10249
Abstract While LiDAR sensors have been succesfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusiong radars and cameras for 3D object detection. However, previous radar-camera fusion models have not been able to fully utilize radar information in that initial 3D proposals were generated based on the camera features only and the instance-level fusion is subsequently conducted. In this paper, we propose radar-camera multi-level fusion (RCM-Fusion), which fuses radar and camera modalities at both the feature-level and instance-level to fully utilize radar information. At the feature-level, we propose a Radar Guided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features to transform image features into precise BEV representations and then adaptively combines the radar and camera BEV features. At the instance-level, we propose a Radar Grid Point Refinement module that reduces localization error by considering the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) over the camera-only baseline model and achieves state-of-the-art performaces among radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.
Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython
Authors: Thomas Bartz-Beielstein
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10262
Pdf link: https://arxiv.org/pdf/2307.10262
Abstract This document provides a comprehensive guide to hyperparameter tuning using spotPython for scikit-learn, PyTorch, and river. The first part introduces spotPython's surrogate model-based optimization process, while the second part focuses on hyperparameter tuning. Several case studies are presented, including hyperparameter tuning for sklearn models such as Support Vector Classification, Random Forests, Gradient Boosting (XGB), and K-nearest neighbors (KNN), as well as a Hoeffding Adaptive Tree Regressor from river. The integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed. With a hands-on approach and step-by-step explanations, this cookbook serves as a practical starting point for anyone interested in hyperparameter tuning with Python. Highlights include the interplay between Tensorboard, PyTorch Lightning, spotPython, and river. This publication is under development, with updates available on the corresponding webpage.
The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination
Authors: Clément L. Canonne, Samuel B. Hopkins, Jerry Li, Allen Liu, Shyam Narayanan
Subjects: Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2307.10273
Pdf link: https://arxiv.org/pdf/2307.10273
Abstract We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing $\mathcal{N}(0,I)$ against $\mathcal{N}(\alpha v, I)$, where $|v|_2 = 1$, with an $\varepsilon$-fraction of adversarial corruptions, to be [ \tilde{\Theta}!\left(\max\left(\frac{\sqrt{d}}{\alpha^2}, \frac{d\varepsilon^3}{\alpha^4},\min\left(\frac{d^{2/3}\varepsilon^{2/3}}{\alpha^{8/3}}, \frac{d \varepsilon}{\alpha^2}\right)\right) \right) \,, ] while the complexity against adaptive adversaries is [ \tilde{\Theta}!\left(\max\left(\frac{\sqrt{d}}{\alpha^2}, \frac{d\varepsilon^2}{\alpha^4} \right)\right) \,, ] which is strictly worse for a large range of vanishing $\varepsilon,\alpha$. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity $\tilde{\Theta}(\max({\sqrt{d}}/{\alpha^2}, {d\varepsilon^2}/{\alpha^4} ))$, and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning
Authors: Chia-Hsiang Kao, Yu-Chiang Frank Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10317
Pdf link: https://arxiv.org/pdf/2307.10317
Abstract Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift. FedBug adaptively leverages the client model parameters, distributed by the server at each global round, as the reference points for cross-client alignment. Specifically, on the client side, FedBug begins by freezing the entire model, then gradually unfreezes the layers, from the input layer to the output layer. This bottom-up approach allows models to train the newly thawed layers to project data into a latent space, wherein the separating hyperplanes remain consistent across all clients. We theoretically analyze FedBug in a novel over-parameterization FL setup, revealing its superior convergence rate compared to FedAvg. Through comprehensive experiments, spanning various datasets, training conditions, and network architectures, we validate the efficacy of FedBug. Our contributions encompass a novel FL framework, theoretical analysis, and empirical validation, demonstrating the wide potential and applicability of FedBug.
Thrust: Adaptively Propels Large Language Models with External Knowledge
Authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Jianshu Chen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.10442
Pdf link: https://arxiv.org/pdf/2307.10442
Abstract Although large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters, the inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary. However, the existing information retrieval techniques could be costly and may even introduce noisy and sometimes misleading knowledge. To address these challenges, we propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary. To achieve this goal, we propose measuring whether a PTLM contains enough knowledge to solve an instance with a novel metric, Thrust, which leverages the representation distribution of a small number of seen instances. Extensive experiments demonstrate that thrust is a good measurement of PTLM models' instance-level knowledgeability. Moreover, we can achieve significantly higher cost-efficiency with the Thrust score as the retrieval indicator than the naive usage of external knowledge on 88% of the evaluated tasks with 26% average performance improvement. Such findings shed light on the real-world practice of knowledge-enhanced LMs with a limited knowledge-seeking budget due to computation latency or costs.
General Debiasing for Multimodal Sentiment Analysis
Authors: Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2307.10511
Pdf link: https://arxiv.org/pdf/2307.10511
Abstract Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while ``blue background'' is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction.
Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning
Authors: Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10574
Pdf link: https://arxiv.org/pdf/2307.10574
Abstract Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.
Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits
Authors: Sharyal Zafar (ENS Rennes, SATIE), Raphaël Feraud, Anne Blavette (ENS Rennes, SATIE), Guy Camilleri (UT3, IRIT), Hamid Ben (SATIE, ENS Rennes)
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10704
Pdf link: https://arxiv.org/pdf/2307.10704
Abstract The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.
Communication-Efficient Split Learning via Adaptive Feature-Wise Compression
Authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10805
Pdf link: https://arxiv.org/pdf/2307.10805
Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
A Hybrid Adaptive Controller for Soft Robot Interchangeability
Authors: Zixi Chen, Xuyang Ren, Matteo Bernabei, Vanessa Mainardi, Gastone Ciuti, Cesare Stefanini
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2307.10838
Pdf link: https://arxiv.org/pdf/2307.10838
Abstract Soft robots have been leveraged in considerable areas like surgery, rehabilitation, and bionics due to their softness, flexibility, and safety. However, it is challenging to produce two same soft robots even with the same mold and manufacturing process owing to the complexity of soft materials. Meanwhile, widespread usage of a system requires the ability to fabricate replaceable components, which is interchangeability. Due to the necessity of this property, a hybrid adaptive controller is introduced to achieve interchangeability from the perspective of control approaches. This method utilizes an offline trained recurrent neural network controller to cope with the nonlinear and delayed response from soft robots. Furthermore, an online optimizing kinematics controller is applied to decrease the error caused by the above neural network controller. Soft pneumatic robots with different deformation properties but the same mold have been included for validation experiments. In the experiments, the systems with different actuation configurations and the different robots follow the desired trajectory with errors of 0.040 and 0.030 compared with the working space length, respectively. Such an adaptive controller also shows good performance on different control frequencies and desired velocities. This controller endows soft robots with the potential for wide application, and future work may include different offline and online controllers. A weight parameter adjusting strategy may also be proposed in the future.
VoteLab: A Modular and Adaptive Experimentation Platform for Online Collective Decision Making
Authors: Renato Kunz, Fatemeh Banaie, Abhinav Sharma, Carina I. Hausladen, Dirk Helbing, Evangelos Pournaras
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2307.10903
Pdf link: https://arxiv.org/pdf/2307.10903
Abstract Digital democracy and new forms for direct digital participation in policy making gain unprecedented momentum. This is particularly the case for preferential voting methods and decision-support systems designed to promote fairer, more inclusive and legitimate collective decision-making processes in citizens assemblies, participatory budgeting and elections. However, a systematic human experimentation with different voting methods is cumbersome and costly. This paper introduces VoteLab, an open-source and thoroughly-documented platform for modular and adaptive design of voting experiments. It supports to visually and interactively build reusable campaigns with a choice of different voting methods, while voters can easily respond to subscribed voting questions on a smartphone. A proof-of-concept with four voting methods and questions on COVID-19 in an online lab experiment have been used to study the consistency of voting outcomes. It demonstrates the capability of VoteLab to support rigorous experimentation of complex voting scenarios.
To What Extent Are Honeypots and Honeynets Autonomic Computing Systems?
Authors: Jason M. Pittman, Shaho Alaee
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2307.11038
Pdf link: https://arxiv.org/pdf/2307.11038
Abstract Cyber threats, such as advanced persistent threats (APTs), ransomware, and zero-day exploits, are rapidly evolving and demand improved security measures. Honeypots and honeynets, as deceptive systems, offer valuable insights into attacker behavior, helping researchers and practitioners develop innovative defense strategies and enhance detection mechanisms. However, their deployment involves significant maintenance and overhead expenses. At the same time, the complexity of modern computing has prompted the rise of autonomic computing, aiming for systems that can operate without human intervention. Recent honeypot and honeynet research claims to incorporate autonomic computing principles, often using terms like adaptive, dynamic, intelligent, and learning. This study investigates such claims by measuring the extent to which autonomic principles principles are expressed in honeypot and honeynet literature. The findings reveal that autonomic computing keywords are present in the literature sample, suggesting an evolution from self-adaptation to autonomic computing implementations. Yet, despite these findings, the analysis also shows low frequencies of self-configuration, self-healing, and self-protection keywords. Interestingly, self-optimization appeared prominently in the literature. While this study presents a foundation for the convergence of autonomic computing and deceptive systems, future research could explore technical implementations in sample articles and test them for autonomic behavior. Additionally, investigations into the design and implementation of individual autonomic computing principles in honeypots and determining the necessary ratio of these principles for a system to exhibit autonomic behavior could provide valuable insights for both researchers and practitioners.
Keyword: quantization

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2307.10554
Pdf link: https://arxiv.org/pdf/2307.10554
Abstract Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.
Quantized Feature Distillation for Network Quantization
Authors: Ke Zhu, Yin-Yin He, Jianxin Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2307.10638
Pdf link: https://arxiv.org/pdf/2307.10638
Abstract Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.
Communication-Efficient Split Learning via Adaptive Feature-Wise Compression
Authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2307.10805
Pdf link: https://arxiv.org/pdf/2307.10805
Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.

A-suozhang / GetArxivDaily

New submissions for Fri, 21 Jul 23 #108

Keyword: efficient

A Lightweight Approach for Network Intrusion Detection based on Self-Knowledge Distillation

Capsule network with shortcut routing

Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge

Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

Efficient selective attention LSTM for well log curve synthesis

Hidden Markov Models with Random Restarts vs Boosting for Malware Detection

On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild

The Full Landscape of Robust Mean Testing: Sharp Separations between Oblivious and Adaptive Contamination

Distributed Sensing, Computing, Communication, and Control Fabric: A Unified Service-Level Architecture for 6G

Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks

NFT-Based Blockchain-Oriented Security Framework for Metaverse Applications

Classification of Visualization Types and Perspectives in Patents

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

Blockchain-Based Federated Learning: Incentivizing Data Sharing and Penalizing Dishonest Behavior

An Analysis of Bugs In Persistent Memory Application

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data

Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

Efficient algorithms for enumerating maximal common subsequences of two strings

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Lightweight Neural Path Planning

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning

Ethosight: A Joint-Embedding Based System for Nuanced Perception Using Contextual Label Affinity Metric and Reasoning Based Iterative Learning

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems

Individualization of atrial tachycardia models for clinical applications: Performance of fiber-independent model

Model order reduction with novel discrete empirical interpolation methods in space-time

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

Exploring the Landscape of Natural Language Processing Research

Conditional expectation network for SHAP

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

A second order directional split exponential integrator for systems of advection--diffusion--reaction equations

A Constraint-based Recommender System via RDF Knowledge Graphs

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO

TransNFV: Integrating Transactional Semantics for Efficient State Management in Virtual Network Functions

Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

Sensing User's Activity, Channel, and Location with Near-Field Extra-Large-Scale MIMO

Shortest Dominating Set Reconfiguration under Token Sliding

Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios

A Circular Restricted n-body Problem

Software Product Line Engineering via Software Transplantation

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

MediaGPT : A Large Language Model Target Chinese Media

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

PASTA: Pretrained Action-State Transformer Agents

Aplicación de tecnologías IoT en el control y seguimiento de trasporte de carga terrestre

ESASCF: Expertise Extraction, Generalization and Reply Framework for an Optimized Automation of Network Security Compliance

Deep Spiking-UNet for Image Processing

PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks

Unsupervised Learning in Complex Systems

Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering

Multi-objective point cloud autoencoders for explainable myocardial infarction prediction

Keyword: faster

SSD Forensic: Evidence Generation And Forensic Research On Solid State Drives Using Trim Analysis

IncDSI: Incrementally Updatable Document Retrieval

Asymptotically minimal contractors based on the centered form;Application to the stability analysis of linear systems

Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum

Predicting human motion intention for pHRI assistive control

Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Learned Thresholds Token Merging and Pruning for Vision Transformers

Parallel Shooting Sequential Quadratic Programming for Nonlinear MPC Problems

Software Product Line Engineering via Software Transplantation

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

Keyword: mobile

Contextual Beamforming: Exploiting Location and AI for Enhanced Wireless Telecommunication Performance

CAPTCHA Types and Breaking Techniques: Design Issues, Challenges, and Future Research Directions

Post-pandemic mobility patterns in London

Technology in Association With Mental Health: Meta-ethnography