Abstract
Artificial intelligence is bringing ever new functionalities to the realm of mobile devices that are now considered essential (e.g., camera and voice assistants, recommender systems). Yet, operating artificial intelligence takes up a substantial amount of energy. However, artificial intelligence is also being used to enable more energy-efficient solutions for mobile systems. Hence, artificial intelligence has two faces in that regard, it is both a key enabler of desired (efficient) mobile functionalities and a major power draw on these devices, playing a part in both the solution and the problem. In this paper, we present a review of the literature of the past decade on the usage of artificial intelligence within the realm of green mobile computing. From the analysis of 34 papers, we highlight the emerging patterns and map the field into 13 main topics that are summarized in details. Our results showcase that the field is slowly increasing in the past years, more specifically, since 2019. Regarding the double impact AI has on the mobile energy consumption, the energy consumption of AI-based mobile systems is under-studied in comparison to the usage of AI for energy-efficient mobile computing, and we argue for more exploratory studies in that direction. We observe that although most studies are framed as solution papers (94%), the large majority do not make those solutions publicly available to the community. Moreover, we also show that most contributions are purely academic (28 out of 34 papers) and that we need to promote the involvement of the mobile software industry in this field.
Simulative Performance Analysis of an AD Function with Road Network Variation
Authors: Daniel Becker, Guido Küppers, Lutz Eckstein
Abstract
Automated driving functions (ADFs) have become increasingly popular in recent years. However, their safety must be assured. Thus, the verification and validation of these functions is still an important open issue in research and development. To achieve this efficiently, scenario-based testing has been established as a valuable methodology among researchers, industry, as well as authorities. Simulations are a powerful way to test those scenarios reproducibly. In this paper, we propose a method to automatically test a set of scenarios in many variations. In contrast to related approaches, those variations are not applied to traffic participants around the ADF, but to the road network to show that parameters regarding the road topology also influence the performance of such an ADF. We present a continuous tool chain to set up scenarios, variate them, run simulations and finally, evaluate the performance with a set of key performance indicators (KPIs).
Sustainable development-oriented campus bike-sharing site evaluation model: A case study of Henan Polytechnic University
Abstract
Promoting sustainable transportation options is increasingly crucial in the pursuit of environmentally friendly and efficient campus mobility systems. Among these options, bike-sharing programs have garnered substantial attention for their capacity to mitigate traffic congestion, decrease carbon emissions, and enhance overall campus sustainability. However, improper selection of bike-sharing sites has led to the growing problems of unsustainable practices in campus, including the disorderly parking and indiscriminate placement of bike-sharing. To this end, this paper proposes a novel sustainable development-oriented campus bike-sharing site evaluation model integrating the improved Delphi and fuzzy comprehensive evaluation approaches. Fourteen evaluation metrics are firstly selected from four dimensions: the user features, implementation and usage characteristics of parking spots, environmental sustainability, and social sustainability, through the combination of expert experience and the improved Delphi method. Then, the analytic hierarchy process and the entropy weight method are employed to determine the weights of the evaluation indices, ensuring a robust and objective assessment framework. The fuzzy comprehensive evaluation method is finally implemented to evaluate the quality of location selection. South Campus of Henan Polytechnic University is selected as a case study using the proposed evaluation system. This work contributes to the existing body of knowledge by presenting a comprehensive location selection evaluation system for campus bike-sharing, informed by the principles of sustainable development.
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Authors: Pierre Champion
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
Abstract
The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
YUDO: YOLO for Uniform Directed Object Detection
Authors: Đorđe Nedeljković
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents an efficient way of detecting directed objects by predicting their center coordinates and direction angle. Since the objects are of uniform size, the proposed model works without predicting the object's width and height. The dataset used for this problem is presented in Honeybee Segmentation and Tracking Datasets project. One of the contributions of this work is an examination of the ability of the standard real-time object detection architecture like YoloV7 to be customized for position and direction detection. A very efficient, tiny version of the architecture is used in this approach. Moreover, only one of three detection heads without anchors is sufficient for this task. We also introduce the extended Skew Intersection over Union (SkewIoU) calculation for rotated boxes - directed IoU (DirIoU), which includes an absolute angle difference. DirIoU is used both in the matching procedure of target and predicted bounding boxes for mAP calculation, and in the NMS filtering procedure. The code and models are available at https://github.com/djordjened92/yudo.
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
Abstract
False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}.
Optimizing Algorithms From Pairwise User Preferences
Authors: Leonid Keselman, Katherine Shih, Martial Hebert, Aaron Steinfeld
Abstract
Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an implicit reward function; however, this reward may be difficult or impossible to effectively capture. In this work, we introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. SortCMA efficiently and robustly leverages user input to find parameter sets without directly modeling a reward. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation, which involves highly complex preferences over robot behavior. We show that our method succeeds in optimizing for the user's goals and perform a user study to evaluate social navigation results.
An Approach for Optimizing Acceleration in Connected and Automated Vehicles
Authors: Filippos N. Tzortzoglou, Dionysios Theodosis, Andreas Malikopoulos
Abstract
Vehicle automation technology has made significant progress, laying the groundwork for a future of fully automated vehicles. This paper delves into the operation of connected and automated vehicles (CAVs). In prior work, we developed a controller that includes a tunable gain whose value significantly influences CAV performance and, in particular, its acceleration. By varying this gain, CAV acceleration is associated with different values depending on some initial conditions. Thus, our goal in this paper is to identify the optimal value of this gain in terms of acceleration for any group of initial conditions. To this end, we formulate an optimization problem where the decision variable is the gain value, and the objective function includes the acceleration of the vehicles. The complexity of this problem prohibits real-time solutions. To address this challenge, we train a neural network to map different initial conditions to the optimal gain values efficiently. We showcase the proposed approach to deriving the optimal gains in a merging scenario with an on-ramp.
Long-Distance Gesture Recognition using Dynamic Neural Networks
Authors: Shubhang Bhatnagar, Sharath Gopal, Narendra Ahuja, Liu Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Robotics (cs.RO); Image and Video Processing (eess.IV)
Abstract
Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
Communication-Efficient Search under Fully Homomorphic Encryption for Federated Machine Learning
Authors: Dongfang Zhao
Subjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
Abstract
Homomorphic encryption (HE) has found extensive utilization in federated learning (FL) systems, capitalizing on its dual advantages: (i) ensuring the confidentiality of shared models contributed by participating entities, and (ii) enabling algebraic operations directly on ciphertexts representing encrypted models. Particularly, the approximate fully homomorphic encryption (FHE) scheme, known as CKKS, has emerged as the de facto encryption scheme, notably supporting decimal numbers. While recent research predominantly focuses on enhancing CKKS's encryption rate and evaluation speed in the context of FL, the search operation has been relatively disregarded due to the tendency of some applications to discard intermediate encrypted models. Yet, emerging studies emphasize the importance of managing and searching intermediate models for specific applications like large-scale scientific computing, necessitating robust data provenance and auditing support. To address this, our paper introduces an innovative approach that efficiently searches for a target encrypted value, incurring only a logarithmic number of network interactions. The proposed method capitalizes on CKKS's additive and multiplicative properties on encrypted models, propagating equality comparisons between values through a balanced binary tree structure to ultimately reach a single aggregate. A comprehensive analysis of the proposed algorithm underscores its potential to significantly broaden FL's applicability and impact.
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.
TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths
Authors: Wenbo Wang, Gen Li, Miguel Zamora, Stelian Coros
Abstract
Precisely reconstructing and manipulating crumpled cloths is challenging due to the high dimensionality of the cloth model, as well as the limited observation at self-occluded regions. We leverage the recent progress in the field of single-view human body reconstruction to template-based reconstruct the crumpled cloths from their top-view depth observations only, with our proposed sim-real registration protocols. In contrast to previous implicit cloth representations, our reconstruction mesh explicitly indicates the positions and visibilities of the entire cloth mesh vertices, enabling more efficient dual-arm and single-arm target-oriented manipulations. Experiments demonstrate that our template-based reconstruction and target-oriented manipulation (TRTM) system can be applied to daily cloths with similar topologies as our template mesh, but have different shapes, sizes, patterns, and physical properties. Videos, datasets, pre-trained models, and code can be downloaded from our project website: https://wenbwa.github.io/TRTM/.
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks
Authors: Jue Chen, Huan Yuan, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
Abstract
Brain-inspired Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient, which are different from traditional Artificial Neural Networks (ANNs) when deployed on edge devices such as neuromorphic chips. Most previous work focuses on SNNs training strategies to improve model performance and brings larger and deeper network architectures. It is difficult to deploy these complex networks on resource-limited edge devices directly. To meet such demand, people compress SNNs very cautiously to balance the performance and the computation efficiency. Existing compression methods either iteratively pruned SNNs using weights norm magnitude or formulated the problem as a sparse learning optimization. We propose an improved end-to-end Minimax optimization method for this sparse learning problem to better balance the model performance and the computation efficiency. We also demonstrate that jointly applying compression and finetuning on SNNs is better than sequentially, especially for extreme compression ratios. The compressed SNN models achieved state-of-the-art (SOTA) performance on various benchmark datasets and architectures. Our code is available at https://github.com/chenjallen/Resource-Constrained-Compression-on-SNN.
Maximizing Network Connectivity for UAV Communications via Reconfigurable Intelligent Surfaces
Authors: Mohammed S. Al-Abiad, Mohammad Javad-Kalbasi, Shahrokh Valaee
Abstract
It is anticipated that integrating unmanned aerial vehicles (UAVs) with reconfigurable intelligent surfaces (RISs), resulting in RIS-assisted UAV networks, will offer improved network connectivity against node failures for the beyond 5G networks. In this context, we utilize a RIS to provide path diversity and alternative connectivity options for information flow from user equipment (UE) to UAVs by adding more links to the network, thereby maximizing its connectivity. This paper employs the algebraic connectivity metric, which is adjusted by the reflected links of the RIS, to formulate the problem of maximizing the network connectivity in two cases. First, we consider formulating the problem for one UE, which is solved optimally using a linear search. Then, we consider the problem of a more general case of multiple UEs, which has high computational complexity. To tackle this problem, we formulate the problem of maximizing the network connectivity as a semi-definite programming (SDP) optimization problem that can be solved efficiently in polynomial time. In both cases, our proposed solutions find the best combination between UE(s) and UAVs through the RIS. As a result, it tunes the phase shifts of the RIS to direct the signals of the UEs to the appropriate UAVs, thus maximizing the network connectivity. Simulation results are conducted to assess the performance of the proposed solutions compared to the existing solutions.
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
Authors: Yuhan Ma, Haiqi Jiang, Chenyou Fan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
Finite Element Operator Network for Solving Parametric PDEs
Authors: Jae Yong Lee, Seungchan Ko, Youngjoon Hong
Abstract
Partial differential equations (PDEs) underlie our understanding and prediction of natural phenomena across numerous fields, including physics, engineering, and finance. However, solving parametric PDEs is a complex task that necessitates efficient numerical methods. In this paper, we propose a novel approach for solving parametric PDEs using a Finite Element Operator Network (FEONet). Our proposed method leverages the power of deep learning in conjunction with traditional numerical methods, specifically the finite element method, to solve parametric PDEs in the absence of any paired input-output training data. We demonstrate the effectiveness of our approach on several benchmark problems and show that it outperforms existing state-of-the-art methods in terms of accuracy, generalization, and computational flexibility. Our FEONet framework shows potential for application in various fields where PDEs play a crucial role in modeling complex domains with diverse boundary conditions and singular behavior. Furthermore, we provide theoretical convergence analysis to support our approach, utilizing finite element approximation in numerical analysis.
A High-efficient Battery Charging System for Electric Vehicle
Abstract
Nowadays, automobile is facing the trend of electrification. Lithium-ion batteries is widely used as their power supplies. Lithium-ion battery has complex characteristics, as a result, Lithium-ion battery needs optimal charging strategies to make sure it is charged safely and efficiently. This paper focuses on development of a high-efficient charging method for lithium-ion battery. To test different charging strategies, the electric vehicle charging system consisting of a dual active bridge DC-DC converter and a Thevenin battery model is implemented. Multistage constant current charging (MSCC) and multistage constant current reflex charging (MSCC with reflex charging) were proposed. Compared with the traditional constant voltage constant current (CC-CV) charging method, MSCC can reduce 12% of the charging time and 1.1% of the battery loss; MSCC with reflex charging has a 10.45% and a 1.54% reduction of charging time and battery loss separately.
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
Abstract
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks (DNNs) demonstrate outstanding performance across most computer vision tasks. Some critical applications, such as autonomous driving or medical imaging, also require investigation into their behavior and the reasons behind the decisions they make. In this vein, DNN attribution consists in studying the relationship between the predictions of a DNN and its inputs. Attribution methods have been adapted to highlight the most relevant weights or neurons in a DNN, allowing to more efficiently select which weights or neurons can be pruned. However, a limitation of these approaches is that weights are typically compared within each layer separately, while some layers might appear as more critical than others. In this work, we propose to investigate DNN layer importance, i.e. to estimate the sensitivity of the accuracy w.r.t. perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g. bit swaps).
Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data
Authors: Lionel Fontan, Typhanie Prince (Praxiling, LNPL), Aleksandra Nowakowska (Praxiling), Halima Sahraoui (LNPL), Silvia Martinez-Ferreiro
Abstract
Background: Speech and language pathologists (SLPs) often relyon judgements of speech fluency for diagnosing or monitoringpatients with aphasia. However, such subjective methods havebeen criticised for their lack of reliability and their clinical cost interms of time. Aims: This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency in people with aphasia (PWA). Methods & Procedures: Twenty-nine PWA and five control participantswere recruited via non-profit organizations and SLP networks. All participants were recorded while reading out loud a set ofsentences taken from the French version of the Boston Diagnostic Aphasia Examination. Three trained SLPs assessed the fluency of each sentence on a five-point qualitative scale. A forward-backward divergence segmentation and a clustering algorithm were used to compute, for each sentence, four automatic predictors of speech fluency: pseudo-syllable rate, speech ratio, rate of silent breaks, and standard deviation of pseudo-syllable length. The four predictors were finally combined into multivariate regression models (a multiplelinear regression - MLR, and two non-linear models) to predict the average SLP ratings of speech fluency, using a leave-one speaker-out validation scheme. Outcomes & Results: All models achieved accurate predictions of speech fluency ratings, with average root-mean-square errors as low as 0.5. The MLR yielded a correlation coefficient of 0.87 with reference ratings at the sentence level, and of 0.93 when aggregating the data for each participant. The inclusion of an additional predictor sensitive to repetitions improved further the predictions with a correlation coefficient of 0.91 at the sentence level, and of 0.96 at the participant level. Conclusions: The algorithms used in this study can constitute a cost-effective and reliable tool for the assessment of the speech fluency of patients with aphasia in read-aloud tasks. Perspectives for the assessment of spontaneous speech are discussed.
E3-UAV: An Edge-based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles
Abstract
Motivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to design a practical system for energy consumption reduction. In response, we present the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by deciding the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task. We first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters. Finally, we evaluate the performance of the system, and our experimental results demonstrate that it can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further.
A Novel Approach for Establishing Connectivity in Partitioned Mobile Sensor Networks Using Beamforming Techniques
Authors: Abbas Mirzaei, Shahram Zandiyan
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Network connectivity is one of the major design issues in the context of mobile sensor networks. Due to diverse communication patterns, some nodes lying in high-traffic zones may consume more energy and eventually die out resulting in network partitioning. This phenomenon may deprive a large number of alive nodes of sending their important time critical data to the sink. The application of data caching in mobile sensor networks is exponentially increasing as a high-speed data storage layer. This paper presents a deep learning-based beamforming approach to find the optimal transmission strategies for cache-enabled backhaul networks. In the proposed scheme, the sensor nodes in isolated partitions work together to form a directional beam which significantly increases their overall communication range to reach out a distant relay node connected to the main part of the network. The proposed methodology of cooperative beamforming-based partition connectivity works efficiently if an isolated cluster gets partitioned with a favorably large number of nodes. We also present a new cross-layer method for link cost that makes a balance between the energy used by the relay. By directly adding the accessible auxiliary nodes to the set of routing links, the algorithm chooses paths which provide maximum dynamic beamforming usage for the intermediate nodes. The proposed approach is then evaluated through simulation results. The simulation results show that the proposed mechanism achieves up to 30% energy consumption reduction through beamforming as partition healing in addition to guarantee user throughput.
Neuro-Symbolic RDF and Description Logic Reasoners: The State-Of-The-Art and Challenges
Abstract
Ontologies are used in various domains, with RDF and OWL being prominent standards for ontology development. RDF is favored for its simplicity and flexibility, while OWL enables detailed domain knowledge representation. However, as ontologies grow larger and more expressive, reasoning complexity increases, and traditional reasoners struggle to perform efficiently. Despite optimization efforts, scalability remains an issue. Additionally, advancements in automated knowledge base construction have created large and expressive ontologies that are often noisy and inconsistent, posing further challenges for conventional reasoners. To address these challenges, researchers have explored neuro-symbolic approaches that combine neural networks' learning capabilities with symbolic systems' reasoning abilities. In this chapter,we provide an overview of the existing literature in the field of neuro-symbolic deductive reasoning supported by RDF(S), the description logics EL and ALC, and OWL 2 RL, discussing the techniques employed, the tasks they address, and other relevant efforts in this area.
Strategic Interactions in Multi-modal Mobility Systems: A Game-Theoretic Perspective
Authors: Gioele Zardini, Nicolas Lanzetti, Giuseppe Belgioioso, Christian Hartnik, Saverio Bolognani, Florian Dörfler, Emilio Frazzoli
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
The evolution of existing transportation systems,mainly driven by urbanization and increased availability of mobility options, such as private, profit-maximizing ride-hailing companies, calls for tools to reason about their design and regulation. To study this complex socio-technical problem, one needs to account for the strategic interactions of the heterogeneous stakeholders involved in the mobility ecosystem and analyze how they influence the system. In this paper, we focus on the interactions between citizens who compete for the limited resources of a mobility system to complete their desired trip. Specifically, we present a game-theoretic framework for multi-modal mobility systems, where citizens, characterized by heterogeneous preferences, have access to various mobility options and seek individually-optimal decisions. We study the arising game and prove the existence of an equilibrium, which can be efficiently computed via a convex optimization problem. Through both an analytical and a numerical case study for the classic scenario of Sioux Falls, USA, we illustrate the capabilities of our model and perform sensitivity analyses. Importantly, we show how to embed our framework into a "larger" game among stakeholders of the mobility ecosystem (e.g., municipality, Mobility Service Providers, and citizens), effectively giving rise to tools to inform strategic interventions and policy-making in the mobility ecosystem.
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
Authors: Qiang Wang, Junlong Du, Ke Yan, Shouhong Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The Contrastive Language-Image Pre-training (CLIP) has recently shown remarkable generalization on "zero-shot" training and has applied to many downstream tasks. We explore the adaptation of CLIP to achieve a more efficient and generalized action recognition method. We propose that the key lies in explicitly modeling the motion cues flowing in video frames. To that end, we design a two-stream motion modeling block to capture motion and spatial information at the same time. And then, the obtained motion cues are utilized to drive a dynamic prompts learner to generate motion-aware prompts, which contain much semantic information concerning human actions. In addition, we propose a multimodal communication block to achieve a collaborative learning and further improve the performance. We conduct extensive experiments on HMDB-51, UCF-101, and Kinetics-400 datasets. Our method outperforms most existing state-of-the-art methods by a significant margin on "few-shot" and "zero-shot" training. We also achieve competitive performance on "closed-set" training with extremely few trainable parameters and additional computational costs.
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
Authors: Ye Tian, Mengyu Yang, Lanshan Zhang, Zhizhen Zhang, Yang Liu, Xiaohui Xie, Xirong Que, Wendong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent adaptive methods for efficient video recognition mostly follow the two-stage paradigm of "preview-then-recognition" and have achieved great success on multiple video benchmarks. However, this two-stage paradigm involves two visits of raw frames from coarse-grained to fine-grained during inference (cannot be parallelized), and the captured spatiotemporal features cannot be reused in the second stage (due to varying granularity), being not friendly to efficiency and computation optimization. To this end, inspired by human cognition, we propose a novel recognition paradigm of "View while Moving" for efficient long-untrimmed video recognition. In contrast to the two-stage paradigm, our paradigm only needs to access the raw frame once. The two phases of coarse-grained sampling and fine-grained recognition are combined into unified spatiotemporal modeling, showing great performance. Moreover, we investigate the properties of semantic units in video and propose a hierarchical mechanism to efficiently capture and reason about the unit-level and video-level temporal semantics in long-untrimmed videos respectively. Extensive experiments on both long-untrimmed and short-trimmed videos demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy as well as efficiency, yielding new efficiency and accuracy trade-offs for video spatiotemporal modeling.
Intrinsic Motivation via Surprise Memory
Authors: Hung Le, Kien Do, Dung Nguyen, Svetha Venkatesh
Abstract
We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.
Why Data Science Projects Fail
Authors: Balaram Panda (The University of Auckland)
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Databases (cs.DB); Methodology (stat.ME)
Abstract
Data Science is a modern Data Intelligence practice, which is the core of many businesses and helps businesses build smart strategies around to deal with businesses challenges more efficiently. Data Science practice also helps in automating business processes using the algorithm, and it has several other benefits, which also deliver in a non-profitable framework. In regards to data science, three key components primarily influence the effective outcome of a data science project. Those are 1.Availability of Data 2.Algorithm 3.Processing power or infrastructure
Service Reservation and Pricing for Green Metaverses: A Stackelberg Game Approach
Authors: Xumin Huang, Yuan Wu, Jiawen Kang, Jiangtian Nie, Weifeng Zhong, Dong In Kim, Shengli Xie
Abstract
Metaverse enables users to communicate, collaborate and socialize with each other through their digital avatars. Due to the spatio-temporal characteristics, co-located users are served well by performing their software components in a collaborative manner such that a Metaverse service provider (MSP) eliminates redundant data transmission and processing, ultimately reducing the total energy consumption. The energyefficient service provision is crucial for enabling the green and sustainable Metaverse. In this article, we take an augmented reality (AR) application as an example to achieve this goal. Moreover, we study an economic issue on how the users reserve offloading services from the MSP and how the MSP determines an optimal charging price since each user is rational to decide whether to accept the offloading service by taking into account the monetary cost. A single-leader multi-follower Stackelberg game is formulated between the MSP and users while each user optimizes an offloading probability to minimize the weighted sum of time, energy consumption and monetary cost. Numerical results show that our scheme achieves energy savings and satisfies individual rationality simultaneously compared with the conventional schemes. Finally, we identify and discuss open directions on how several emerging technologies are combined with the sustainable green Metaverse.
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
Authors: Lucian Bicsi, Bogdan Alexe, Radu Tudor Ionescu, Marius Leordeanu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
We propose JEDI, a multi-dataset semi-supervised learning method, which efficiently combines knowledge from multiple experts, learned on different datasets, to train and improve the performance of individual, per dataset, student models. Our approach achieves this by addressing two important problems in current machine learning research: generalization across datasets and limitations of supervised training due to scarcity of labeled data. We start with an arbitrary number of experts, pretrained on their own specific dataset, which form the initial set of student models. The teachers are immediately derived by concatenating the feature representations from the penultimate layers of the students. We then train all models in a student-teacher semi-supervised learning scenario until convergence. In our efficient approach, student-teacher training is carried out jointly and end-to-end, showing that both students and teachers improve their generalization capacity during training. We validate our approach on four video action recognition datasets. By simultaneously considering all datasets within a unified semi-supervised setting, we demonstrate significant improvements over the initial experts.
Gaussian Image Anomaly Detection with Greedy Eigencomponent Selection
Authors: Tetiana Gula, João P C Bertoldo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Anomaly detection (AD) in images, identifying significant deviations from normality, is a critical issue in computer vision. This paper introduces a novel approach to dimensionality reduction for AD using pre-trained convolutional neural network (CNN) that incorporate EfficientNet models. We investigate the importance of component selection and propose two types of tree search approaches, both employing a greedy strategy, for optimal eigencomponent selection. Our study conducts three main experiments to evaluate the effectiveness of our approach. The first experiment explores the influence of test set performance on component choice, the second experiment examines the performance when we train on one anomaly type and evaluate on all other types, and the third experiment investigates the impact of using a minimum number of images for training and selecting them based on anomaly types. Our approach aims to find the optimal subset of components that deliver the highest performance score, instead of focusing solely on the proportion of variance explained by each component and also understand the components behaviour in different settings. Our results indicate that the proposed method surpasses both Principal Component Analysis (PCA) and Negated Principal Component Analysis (NPCA) in terms of detection accuracy, even when using fewer components. Thus, our approach provides a promising alternative to conventional dimensionality reduction techniques in AD, and holds potential to enhance the efficiency and effectiveness of AD systems.
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
Authors: Mai Le, Dinh Thai Hoang, Diep N. Nguyen, Won-Joo Hwang, Quoc-Viet Pham
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Abstract
Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system.
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
Authors: Marc W. Brittain, Luis E. Alvarez, Kara Breeden
Abstract
Advanced Air Mobility (AAM) introduces a new, efficient mode of transportation with the use of vehicle autonomy and electrified aircraft to provide increasingly autonomous transportation between previously underserved markets. Safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, knowledge of vehicle dynamics, and weather. The processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. These challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. We present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. The problem is formulated as a Markov Decision Process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. We introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. A comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
Random-Walk Metaball-Imaging Discrete Element Lattice Boltzmann Method for 3D Solute Transport in Fluid-Particle Systems with Complex Granular Morphologies
Authors: Yifeng Zhao, Pei Zhang, Stan Z. Li, S.A. Galindo-Torres
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Solute transport in fluid-particle systems is a fundamental process in numerous scientific and engineering disciplines. The simulation of it necessitates the consideration of solid particles with intricate shapes and sizes. To address this challenge, this study proposes the Random-Walk Metaball-Imaging Discrete Element Lattice Boltzmann Method (RW-MI-DELBM). In this model, we reconstruct particle geometries with the Metaball-Imaging algorithm, capture the particle behavior using the Discrete Element Method (DEM), simulate fluid behavior by the Lattice Boltzmann Method (LBM), and represent solute behavior through the Random Walk Method (RWM). Through the integration of these techniques with specially designed boundary conditions, we achieve to simulate the solute transport in fluid-particle systems comprising complex particle morphologies. Thorough validations, including analytical soluutions and experiments, are performed to assess the robustness and accuracy of this framework. The results demonstrate that the proposed framework can accurately capture the complex dynamics of solute transport under strict mass conservation. In particular, an investigation is carried out to assess the influence of particle morphologies on solute transport in a 3D oscillator, with a focus on identifying correlations between shape features and dispersion coefficients. Notably, all selected shape features exhibited strong correlations with the dispersion coefficient, indicating the significant influence of particle shapes on transport phenomena. However, due to the complexity of the relationship and the limited number of simulations, no clear patterns could be observed. Further comprehensive analyses incorporating a broader range of shape features and varying conditions are necessary to fully comprehend their collective influence on the dispersion coefficient.
Kairos: : Practical Intrusion Detection and Investigation using Whole-system Provenance
Authors: Zijun Cheng, Qiujian Lv, Jinyuan Liang, Yan Wang, Degang Sun, Thomas Pasquier, Xueyuan Han
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
Neural Field Movement Primitives for Joint Modelling of Scenes and Motions
Authors: Ahmet Tekden, Marc Peter Deisenroth, Yasemin Bekiroglu
Abstract
This paper presents a novel Learning from Demonstration (LfD) method that uses neural fields to learn new skills efficiently and accurately. It achieves this by utilizing a shared embedding to learn both scene and motion representations in a generative way. Our method smoothly maps each expert demonstration to a scene-motion embedding and learns to model them without requiring hand-crafted task parameters or large datasets. It achieves data efficiency by enforcing scene and motion generation to be smooth with respect to changes in the embedding space. At inference time, our method can retrieve scene-motion embeddings using test time optimization, and generate precise motion trajectories for novel scenes. The proposed method is versatile and can employ images, 3D shapes, and any other scene representations that can be modeled using neural fields. Additionally, it can generate both end-effector positions and joint angle-based trajectories. Our method is evaluated on tasks that require accurate motion trajectory generation, where the underlying task parametrization is based on object positions and geometric scene changes. Experimental results demonstrate that the proposed method outperforms the baseline approaches and generalizes to novel scenes. Furthermore, in real-world experiments, we show that our method can successfully model multi-valued trajectories, it is robust to the distractor objects introduced at inference time, and it can generate 6D motions.
A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique
Authors: Gokulprasath R
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning has revolutionized industries like computer vision, natural language processing, and speech recognition. However, back propagation, the main method for training deep neural networks, faces challenges like computational overhead and vanishing gradients. In this paper, we propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer. Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets. This research presents a promising direction for efficient and effective deep neural network training.
Prompting In-Context Operator Learning with Sensor Data, Equations, and Natural Language
Authors: Liu Yang, Tingwei Meng, Siting Liu, Stanley J. Osher
Abstract
In the growing domain of scientific machine learning, in-context operator learning has demonstrated notable potential in learning operators from prompted data during inference stage without weight updates. However, the current model's overdependence on sensor data, may inadvertently overlook the invaluable human insight into the operator. To address this, we present a transformation of in-context operator learning into a multi-modal paradigm. We propose the use of "captions" to integrate human knowledge about the operator, expressed through natural language descriptions and equations. We illustrate how this method not only broadens the flexibility and generality of physics-informed learning, but also significantly boosts learning performance and reduces data needs. Furthermore, we introduce a more efficient neural network architecture for multi-modal in-context operator learning, referred to as "ICON-LM", based on a language-model-like architecture. We demonstrate the viability of "ICON-LM" for scientific machine learning tasks, which creates a new path for the application of language models.
CERMET: Coding for Energy Reduction with Multiple Encryption Techniques -- $It's\ easy\ being\ green$
Authors: Jongchan Woo, Vipindev Adat Vasudevan, Benjamin Kim, Alejandro Cohen, Rafael G. L. D'Oliveira, Thomas Stahlbuhk, Muriel Médard
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR); Information Theory (cs.IT); Systems and Control (eess.SY)
Abstract
This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to enhance its performance. The universality of the approach is demonstrated by designing the architecture to accommodate both asymmetric and symmetric cryptosystems. The analysis reveals that the benefits of this proposed approach are multifold, reducing energy per bit and area without compromising security or throughput. The optimized hardware architectures can achieve below 1 pJ/bit operations for AES-256. Furthermore, for a public key cryptosystem based on Elliptic Curve Cryptography (ECC), a remarkable 14.6X reduction in energy per bit and a 9.3X reduction in area are observed, bringing it to less than 1 nJ/bit.
Ergodic Capacity of Dyadic Fading Channels in Ultra Low-SNR Regime
Abstract
In a mobile wireless channel, the small-scale multipath fading induces temporal channel fluctuations in the form of peaks and deep fades. The channel capacity degradation with fading severity in the high signal-to-noise ratio (SNR) regime is well known in the wireless communication literature: the probability of deep fades increases significantly with higher fading severity resulting in poor performance. In this paper, we focus on double-fading pinhole channels under perfect CSIT to show a very counter-intuitive result that - higher fading severity enables higher ergodic capacity at sufficiently low SNR. The underlying reason is that at low SNRs, ergodic capacity depends crucially on the probability distribution of channel peaks (simply tail distribution); for the pinhole channel, the tail distribution improves with increased fading severity. This allows a transmitter operating at low SNR to exploit channel peaks more efficiently resulting in a net improvement in achievable spectral efficiency. We derive a new key result quantifying the above dependence for the double-Nakagami-$m$ fading pinhole channel - that is, the ergodic capacity ${C} \propto (m_T m_R)^{-1}$ at low SNR, where $m_T m_R$ is the product of fading (severity) parameters of the two independent Nakagami-$m$ fadings involved.
Learning of discrete models of variational PDEs from data
Abstract
We show how to learn discrete field theories from observational data of fields on a space-time lattice. For this, we train a neural network model of a discrete Lagrangian density such that the discrete Euler--Lagrange equations are consistent with the given training data. We, thus, obtain a structure-preserving machine learning architecture. Lagrangian densities are not uniquely defined by the solutions of a field theory. We introduce a technique to derive regularisers for the training process which optimise numerical regularity of the discrete field theory. Minimisation of the regularisers guarantees that close to the training data the discrete field theory behaves robust and efficient when used in numerical simulations. Further, we show how to identify structurally simple solutions of the underlying continuous field theory such as travelling waves. This is possible even when travelling waves are not present in the training data. This is compared to data-driven model order reduction based approaches, which struggle to identify suitable latent spaces containing structurally simple solutions when these are not present in the training data. Ideas are demonstrated on examples based on the wave equation and the Schr\"odinger equation.
DOST -- Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
Abstract
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
Scene-Generalizable Interactive Segmentation of Radiance Fields
Abstract
Existing methods for interactive segmentation in radiance fields entail scene-specific optimization and thus cannot generalize across different scenes, which greatly limits their applicability. In this work we make the first attempt at Scene-Generalizable Interactive Segmentation in Radiance Fields (SGISRF) and propose a novel SGISRF method, which can perform 3D object segmentation for novel (unseen) scenes represented by radiance fields, guided by only a few interactive user clicks in a given set of multi-view 2D images. In particular, the proposed SGISRF focuses on addressing three crucial challenges with three specially designed techniques. First, we devise the Cross-Dimension Guidance Propagation to encode the scarce 2D user clicks into informative 3D guidance representations. Second, the Uncertainty-Eliminated 3D Segmentation module is designed to achieve efficient yet effective 3D segmentation. Third, Concealment-Revealed Supervised Learning scheme is proposed to reveal and correct the concealed 3D segmentation errors resulted from the supervision in 2D space with only 2D mask annotations. Extensive experiments on two real-world challenging benchmarks covering diverse scenes demonstrate 1) effectiveness and scene-generalizability of the proposed method, 2) favorable performance compared to classical method requiring scene-specific optimization.
Keyword: faster
Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
Abstract
In this work, we describe a method for large-scale 3D cell-tracking through a segmentation selection approach. The proposed method is effective at tracking cells across large microscopy datasets on two fronts: (i) It can solve problems containing millions of segmentation instances in terabyte-scale 3D+t datasets; (ii) It achieves competitive results with or without deep learning, which requires 3D annotated data, that is scarce in the fluorescence microscopy field. The proposed method computes cell tracks and segments using a hierarchy of segmentation hypotheses and selects disjoint segments by maximizing the overlap between adjacent frames. We show that this method achieves state-of-the-art results in 3D images from the cell tracking challenge and has a faster integer linear programming formulation. Moreover, our framework is flexible and supports segmentations from off-the-shelf cell segmentation models and can combine them into an ensemble that improves tracking. The code is available https://github.com/royerlab/ultrack.
Estimation of Human Condition at Disaster Site Using Aerial Drone Images
Abstract
Drones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classified the human damage status using 3D ResNet. The results showed that the status with characteristic human actions could be classified with a recall rate of more than 80%, while other statuses with similar human actions could only be classified with a recall rate of about 50%. In addition, a cloud-based VR presentation application suggested the effectiveness of using drones to understand the disaster site and estimate the human condition.
BOPIM: Bayesian Optimization for influence maximization on temporal networks
Abstract
The goal of influence maximization (IM) is to select a small set of seed nodes which maximize the spread of influence on a network. In this work, we propose BOPIM, a Bayesian Optimization (BO) algorithm for IM on temporal networks. The IM task is well-suited for a BO solution due to its expensive and complicated objective function. We propose a simple surrogate function to model the objective function and leverage Gaussian Process regression with shrinkage priors to fit the model. An acquisition function based on the median of the posterior distribution leads to a straightforward procedure to select the next sampling point. In numerical experiments on real-world networks, we find that, on average, the surrogate function estimates the true influence spread within a few nodes. Additionally, we show that BOPIM yields comparable influence spreads to a gold-standard greedy algorithm while being as much as seventeen times faster. We also use these experiments to demonstrate the proposed method's ability to quantify uncertainty in optimal seed sets. To the knowledge of the author, this is the first attempt to look at uncertainty in the seed sets for IM, as well as the first application of BO to a constrained, combinatorial optimization problem.
Keyword: mobile
The Two Faces of AI in Green Mobile Computing: A Literature Review
Authors: Wander Siemers, June Sallou, Luís Cruz
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract
Artificial intelligence is bringing ever new functionalities to the realm of mobile devices that are now considered essential (e.g., camera and voice assistants, recommender systems). Yet, operating artificial intelligence takes up a substantial amount of energy. However, artificial intelligence is also being used to enable more energy-efficient solutions for mobile systems. Hence, artificial intelligence has two faces in that regard, it is both a key enabler of desired (efficient) mobile functionalities and a major power draw on these devices, playing a part in both the solution and the problem. In this paper, we present a review of the literature of the past decade on the usage of artificial intelligence within the realm of green mobile computing. From the analysis of 34 papers, we highlight the emerging patterns and map the field into 13 main topics that are summarized in details. Our results showcase that the field is slowly increasing in the past years, more specifically, since 2019. Regarding the double impact AI has on the mobile energy consumption, the energy consumption of AI-based mobile systems is under-studied in comparison to the usage of AI for energy-efficient mobile computing, and we argue for more exploratory studies in that direction. We observe that although most studies are framed as solution papers (94%), the large majority do not make those solutions publicly available to the community. Moreover, we also show that most contributions are purely academic (28 out of 34 papers) and that we need to promote the involvement of the mobile software industry in this field.
Resource Cooperation in MEC and SDN based Vehicular Networks
Authors: Beiran Chen, Marco Ruffini
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Internet of Things (IoT) systems require highly scalable infrastructure to adaptively provide services to meet various performance requirements. Combining Software-Defined Networking (SDN) with Mobile Edge Cloud (MEC) technology brings more flexibility for IoT systems. We present a four-tier task processing architecture for MEC and vehicular networks, which includes processing tasks locally within a vehicle, on neighboring vehicles, on an edge cloud, and on a remote cloud. The flexible network connection is controlled by SDN. We propose a CPU resource allocation algorithm, called Partial Idle Resource Strategy (PIRS) with Vehicle to Vehicle (V2V) communications, based on Asymmetric Nash Bargaining Solution (ANBS) in Game Theory. PIRS encourages vehicles in the same location to cooperate by sharing part of their spare CPU resources. In our simulations, we adopt four applications running on the vehicles to generate workload. We compare the proposed algorithm with Non-Cooperation Strategy (NCS) and All Idle Resource Strategy (AIRS). In NCS, the vehicles execute tasks generated by the applications in their own On-Board Units (OBU), while in AIRS vehicles provide all their CPU resources to help other vehicles offloading requests. Our simulation results show that our PIRS strategy can execute more tasks on the V2V layer and lead to fewer number of task (and their length) to be offloaded to the cloud, reaching up to 28% improvement compared to NCS and up to 10% improvement compared to AIRS.
Quantization Aware Factorization for Deep Neural Network Compression
Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak
Abstract
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
A Forensic Methodology for Detecting Image Manipulations
Abstract
By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and mobile forensic artifacts analysis were conducted for detecting image manipulation. Image file analysis involves parsing the metadata of manipulated images (e.g., Exif, DQT, and Filename Signature) and comparing them with a Reference DB to detect manipulation. The Reference DB is a database that collects manipulation-related traces left in image metadata, which serves as a criterion for detecting image manipulation. In the mobile forensic artifacts analysis, packages related to image editing tools were extracted and analyzed to aid the detection of image manipulation. The proposed methodology overcomes the limitations of existing graphic feature-based analysis and combines with image processing techniques, providing the advantage of reducing false positives. The research results demonstrate the significant role of such methodology in digital forensic investigation and analysis. Additionally, We provide the code for parsing image metadata and the Reference DB along with the dataset of manipulated images, aiming to contribute to related research.
A High-efficient Battery Charging System for Electric Vehicle
Abstract
Nowadays, automobile is facing the trend of electrification. Lithium-ion batteries is widely used as their power supplies. Lithium-ion battery has complex characteristics, as a result, Lithium-ion battery needs optimal charging strategies to make sure it is charged safely and efficiently. This paper focuses on development of a high-efficient charging method for lithium-ion battery. To test different charging strategies, the electric vehicle charging system consisting of a dual active bridge DC-DC converter and a Thevenin battery model is implemented. Multistage constant current charging (MSCC) and multistage constant current reflex charging (MSCC with reflex charging) were proposed. Compared with the traditional constant voltage constant current (CC-CV) charging method, MSCC can reduce 12% of the charging time and 1.1% of the battery loss; MSCC with reflex charging has a 10.45% and a 1.54% reduction of charging time and battery loss separately.
Case Study: Using AI-Assisted Code Generation In Mobile Teams
Abstract
The aim of this study is to evaluate the performance of AI-assisted programming in actual mobile development teams that are focused on native mobile languages like Kotlin and Swift. The extensive case study involves 16 participants and 2 technical reviewers, from a software development department designed to understand the impact of using LLMs trained for code generation in specific phases of the team, more specifically, technical onboarding and technical stack switch. The study uses technical problems dedicated to each phase and requests solutions from the participants with and without using AI-Code generators. It measures time, correctness, and technical integration using ReviewerScore, a metric specific to the paper and extracted from actual industry standards, the code reviewers of merge requests. The output is converted and analyzed together with feedback from the participants in an attempt to determine if using AI-assisted programming tools will have an impact on getting developers onboard in a project or helping them with a smooth transition between the two native development environments of mobile development, Android and iOS. The study was performed between May and June 2023 with members of the mobile department of a software development company based in Cluj-Napoca, with Romanian ownership and management.
A Novel Approach for Establishing Connectivity in Partitioned Mobile Sensor Networks Using Beamforming Techniques
Authors: Abbas Mirzaei, Shahram Zandiyan
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Network connectivity is one of the major design issues in the context of mobile sensor networks. Due to diverse communication patterns, some nodes lying in high-traffic zones may consume more energy and eventually die out resulting in network partitioning. This phenomenon may deprive a large number of alive nodes of sending their important time critical data to the sink. The application of data caching in mobile sensor networks is exponentially increasing as a high-speed data storage layer. This paper presents a deep learning-based beamforming approach to find the optimal transmission strategies for cache-enabled backhaul networks. In the proposed scheme, the sensor nodes in isolated partitions work together to form a directional beam which significantly increases their overall communication range to reach out a distant relay node connected to the main part of the network. The proposed methodology of cooperative beamforming-based partition connectivity works efficiently if an isolated cluster gets partitioned with a favorably large number of nodes. We also present a new cross-layer method for link cost that makes a balance between the energy used by the relay. By directly adding the accessible auxiliary nodes to the set of routing links, the algorithm chooses paths which provide maximum dynamic beamforming usage for the intermediate nodes. The proposed approach is then evaluated through simulation results. The simulation results show that the proposed mechanism achieves up to 30% energy consumption reduction through beamforming as partition healing in addition to guarantee user throughput.
Enhancing Mobile Privacy and Security: A Face Skin Patch-Based Anti-Spoofing Approach
Authors: Qiushi Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
As Facial Recognition System(FRS) is widely applied in areas such as access control and mobile payments due to its convenience and high accuracy. The security of facial recognition is also highly regarded. The Face anti-spoofing system(FAS) for face recognition is an important component used to enhance the security of face recognition systems. Traditional FAS used images containing identity information to detect spoofing traces, however there is a risk of privacy leakage during the transmission and storage of these images. Besides, the encryption and decryption of these privacy-sensitive data takes too long compared to inference time by FAS model. To address the above issues, we propose a face anti-spoofing algorithm based on facial skin patches leveraging pure facial skin patch images as input, which contain no privacy information, no encryption or decryption is needed for these images. We conduct experiments on several public datasets, the results prove that our algorithm has demonstrated superiority in both accuracy and speed.
Enhancement of Satellite-to-Phone Link Budget by Using Distributed Beamforming
Authors: Zhuoao Xu, Yue Gao, Gaojie Chen, Ryan Fernandez, Vedaprabhu Basavarajappa, Rahim Tafazolli
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Small satellites in Low Earth Orbit (LEO) attract much attention from both industry and academia. The latest production and launch technologies constantly drive the development of LEO constellations. However, the wideband signal, except text messages, cannot be transmitted directly from an LEO satellite to a standard mobile cellular phone due to the insufficient link budget. The current LEO constellation network has to use an extra ground device to receive the signal from the satellite first and then forward the signal to the User Equipment (UE). To achieve direct network communications between LEO satellites and UE, we propose a novel distributed beamforming technology based on the superposition of electromagnetic (EM) waves radiated from multiple satellites that can significantly enhance the link budget in this paper. EM full-wave simulation and Monte Carlo simulation results are provided to verify the effectiveness of the proposed method. The simulation results show a nearly 6 dB enhancement using two radiation sources and an almost 12 dB enhancement using four sources. The received power enhancement could be doubled compared to the diversity gain in Multiple-Input and Single-Output (MISO). Furthermore, other practical application challenges, such as the synchronization and Doppler effect, are also presented.
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
Authors: Mai Le, Dinh Thai Hoang, Diep N. Nguyen, Won-Joo Hwang, Quoc-Viet Pham
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Abstract
Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system.
can-train-and-test: A Curated CAN Dataset for Automotive Intrusion Detection
Abstract
When it comes to in-vehicle networks (IVNs), the controller area network -- CAN -- bus dominates the market; automobiles manufactured and sold around the world depend on the CAN bus for safety-critical communications between various components of the vehicle (e.g., the engine, the transmission, the steering column). Unfortunately, the CAN bus is inherently insecure; in fact, it completely lacks controls such as authentication, authorization, and confidentiality (i.e., encryption). Therefore, researchers have travailed to develop automotive security enhancements. The automotive intrusion detection system (IDS) is especially popular in the literature -- due to its relatively low cost in terms of money, resource utilization, and implementation effort. That said, developing and evaluating an automotive IDS is often challenging; if researchers do not have access to a test vehicle, then they are forced to depend on publicly available CAN data -- which is not without limitations. Lack of access to adequate CAN data, then, becomes a barrier to entry into automotive security research. We seek to lower that barrier to entry by introducing a new CAN dataset to facilitate the development and evaluation of automotive IDSs. Our dataset, dubbed can-train-and-test, provides CAN data from four different vehicles produced by two different manufacturers. The attack captures for each vehicle model are equivalent, enabling researchers to assess the ability of a given IDS to generalize to different vehicle models and even different vehicle manufacturers. Our dataset contains replayable .log files as well as labeled and unlabeled .csv files, thereby meeting a variety of development and evaluation needs. Furthermore, can-train-and-test offers nine unique attacks, ranging from denial of service (DoS) to gear spoofing to standstill...
Ergodic Capacity of Dyadic Fading Channels in Ultra Low-SNR Regime
Abstract
In a mobile wireless channel, the small-scale multipath fading induces temporal channel fluctuations in the form of peaks and deep fades. The channel capacity degradation with fading severity in the high signal-to-noise ratio (SNR) regime is well known in the wireless communication literature: the probability of deep fades increases significantly with higher fading severity resulting in poor performance. In this paper, we focus on double-fading pinhole channels under perfect CSIT to show a very counter-intuitive result that - higher fading severity enables higher ergodic capacity at sufficiently low SNR. The underlying reason is that at low SNRs, ergodic capacity depends crucially on the probability distribution of channel peaks (simply tail distribution); for the pinhole channel, the tail distribution improves with increased fading severity. This allows a transmitter operating at low SNR to exploit channel peaks more efficiently resulting in a net improvement in achievable spectral efficiency. We derive a new key result quantifying the above dependence for the double-Nakagami-$m$ fading pinhole channel - that is, the ergodic capacity ${C} \propto (m_T m_R)^{-1}$ at low SNR, where $m_T m_R$ is the product of fading (severity) parameters of the two independent Nakagami-$m$ fadings involved.
Keyword: pruning
D-Score: A Synapse-Inspired Approach for Filter Pruning
Authors: Doyoung Park, Jinsoo Kim, Jina Nam, Jooyoung Chang, Sang Min Park
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Abstract
This paper introduces a new aspect for determining the rank of the unimportant filters for filter pruning on convolutional neural networks (CNNs). In the human synaptic system, there are two important channels known as excitatory and inhibitory neurotransmitters that transmit a signal from a neuron to a cell. Adopting the neuroscientific perspective, we propose a synapse-inspired filter pruning method, namely Dynamic Score (D-Score). D-Score analyzes the independent importance of positive and negative weights in the filters and ranks the independent importance by assigning scores. Filters having low overall scores, and thus low impact on the accuracy of neural networks are pruned. The experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed method by reducing notable amounts of FLOPs and Params without significant Acc. Drop.
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks (DNNs) demonstrate outstanding performance across most computer vision tasks. Some critical applications, such as autonomous driving or medical imaging, also require investigation into their behavior and the reasons behind the decisions they make. In this vein, DNN attribution consists in studying the relationship between the predictions of a DNN and its inputs. Attribution methods have been adapted to highlight the most relevant weights or neurons in a DNN, allowing to more efficiently select which weights or neurons can be pruned. However, a limitation of these approaches is that weights are typically compared within each layer separately, while some layers might appear as more critical than others. In this work, we propose to investigate DNN layer importance, i.e. to estimate the sensitivity of the accuracy w.r.t. perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g. bit swaps).
Keyword: diffusion
3D Scene Diffusion Guidance using Scene Graphs
Authors: Mohammad Naanaa, Katharina Schmid, Yinyu Nie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Guided synthesis of high-quality 3D scenes is a challenging task. Diffusion models have shown promise in generating diverse data, including 3D scenes. However, current methods rely directly on text embeddings for controlling the generation, limiting the incorporation of complex spatial relationships between objects. We propose a novel approach for 3D scene diffusion guidance using scene graphs. To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network. We show that our approach significantly improves the alignment between scene description and generated scene.
Instabilities of explicit finite difference schemes with ghost points on the diffusion equation
Abstract
Ghost, or fictitious points allow to capture boundary conditions that are not located on the finite difference grid discretization. We explore in this paper the impact of ghost points on the stability of the explicit Euler finite difference scheme in the context of the diffusion equation. In particular, we consider the case of a one-touch option under the Black-Scholes model. The observations and results are however valid for a much wider range of financial contracts and models.
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Authors: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang
Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at this http URL
Going Deeper with Five-point Stencil Convolutions for Reaction-Diffusion Equations
Abstract
Physics-informed neural networks have been widely applied to partial differential equations with great success because the physics-informed loss essentially requires no observations or discretization. However, it is difficult to optimize model parameters, and these parameters must be trained for each distinct initial condition. To overcome these challenges in second-order reaction-diffusion type equations, a possible way is to use five-point stencil convolutional neural networks (FCNNs). FCNNs are trained using two consecutive snapshots, where the time step corresponds to the step size of the given snapshots. Thus, the time evolution of FCNNs depends on the time step, and the time step must satisfy its CFL condition to avoid blow-up solutions. In this work, we propose deep FCNNs that have large receptive fields to predict time evolutions with a time step larger than the threshold of the CFL condition. To evaluate our models, we consider the heat, Fisher's, and Allen-Cahn equations with diverse initial conditions. We demonstrate that deep FCNNs retain certain accuracies, in contrast to FDMs that blow up.
CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks
Abstract
Existing approaches for information cascade prediction fall into three main categories: feature-driven methods, point process-based methods, and deep learning-based methods. Among them, deep learning-based methods, characterized by its superior learning and representation capabilities, mitigates the shortcomings inherent of the other methods. However, current deep learning methods still face several persistent challenges. In particular, accurate representation of user attributes remains problematic due to factors such as fake followers and complex network configurations. Previous algorithms that focus on the sequential order of user activations often neglect the rich insights offered by activation timing. Furthermore, these techniques often fail to holistically integrate temporal and structural aspects, thus missing the nuanced propagation trends inherent in information cascades.To address these issues, we propose the Cross-Domain Information Fusion Framework (CasCIFF), which is tailored for information cascade prediction. This framework exploits multi-hop neighborhood information to make user embeddings robust. When embedding cascades, the framework intentionally incorporates timestamps, endowing it with the ability to capture evolving patterns of information diffusion. In particular, the CasCIFF seamlessly integrates the tasks of user classification and cascade prediction into a consolidated framework, thereby allowing the extraction of common features that prove useful for all tasks, a strategy anchored in the principles of multi-task learning.
IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models
Authors: Fadi Boutros, Jonas Henry Grebe, Arjan Kuijper, Naser Dame
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The availability of large-scale authentic face databases has been crucial to the significant advances made in face recognition research over the past decade. However, legal and ethical concerns led to the recent retraction of many of these databases by their creators, raising questions about the continuity of future face recognition research without one of its key resources. Synthetic datasets have emerged as a promising alternative to privacy-sensitive authentic data for face recognition development. However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. Through extensive evaluations, our proposed synthetic-based face recognition approach pushed the limits of state-of-the-art performances, achieving, for example, 98.00% accuracy on the Labeled Faces in the Wild (LFW) benchmark, far ahead from the recent synthetic-based face recognition solutions with 95.40% and bridging the gap to authentic-based face recognition with 99.82% accuracy.
Do Diffusion Models Suffer Error Propagation? Theoretical Analysis and Consistency Regularization
Authors: Yangming Li, Zhaozhi Qian, Mihaela van der Schaar
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
While diffusion models have achieved promising performances in data synthesis, they might suffer error propagation because of their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, a strict analysis is expected since many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation and we then propose a regularization to address this problem. Our theoretical analysis reveals that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module can't recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes. We further introduce a bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
Abstract
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial relation understanding and numeration failure) in complex natural scenes, which impedes the high-faithfulness text-to-image generation. Although recent efforts have been made to improve controllability by giving fine-grained guidance (e.g., sketch and scribbles), this issue has not been fundamentally tackled since users have to provide such guidance information manually. In this work, we strive to synthesize high-fidelity images that are semantically aligned with a given textual prompt without any guidance. Toward this end, we propose a coarse-to-fine paradigm to achieve layout planning and image generation. Concretely, we first generate the coarse-grained layout conditioned on a given textual prompt via in-context learning based on Large Language Models. Afterward, we propose a fine-grained object-interaction diffusion method to synthesize high-faithfulness images conditioned on the prompt and the automatically generated layout. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art models in terms of layout and image generation. Our code and settings are available at \url{https://layoutllm-t2i.github.io}.
Keyword: adaptive
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers
Abstract
Federated learning (FL) has been widely deployed to enable machine learning training on sensitive data across distributed devices. However, the decentralized learning paradigm and heterogeneity of FL further extend the attack surface for backdoor attacks. Existing FL attack and defense methodologies typically focus on the whole model. None of them recognizes the existence of backdoor-critical (BC) layers-a small subset of layers that dominate the model vulnerabilities. Attacking the BC layers achieves equivalent effects as attacking the whole model but at a far smaller chance of being detected by state-of-the-art (SOTA) defenses. This paper proposes a general in-situ approach that identifies and verifies BC layers from the perspective of attackers. Based on the identified BC layers, we carefully craft a new backdoor attack methodology that adaptively seeks a fundamental balance between attacking effects and stealthiness under various defense strategies. Extensive experiments show that our BC layer-aware backdoor attacks can successfully backdoor FL under seven SOTA defenses with only 10% malicious clients and outperform the latest backdoor attack methods.
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Authors: Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li
Abstract
It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.
Resource Cooperation in MEC and SDN based Vehicular Networks
Authors: Beiran Chen, Marco Ruffini
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Internet of Things (IoT) systems require highly scalable infrastructure to adaptively provide services to meet various performance requirements. Combining Software-Defined Networking (SDN) with Mobile Edge Cloud (MEC) technology brings more flexibility for IoT systems. We present a four-tier task processing architecture for MEC and vehicular networks, which includes processing tasks locally within a vehicle, on neighboring vehicles, on an edge cloud, and on a remote cloud. The flexible network connection is controlled by SDN. We propose a CPU resource allocation algorithm, called Partial Idle Resource Strategy (PIRS) with Vehicle to Vehicle (V2V) communications, based on Asymmetric Nash Bargaining Solution (ANBS) in Game Theory. PIRS encourages vehicles in the same location to cooperate by sharing part of their spare CPU resources. In our simulations, we adopt four applications running on the vehicles to generate workload. We compare the proposed algorithm with Non-Cooperation Strategy (NCS) and All Idle Resource Strategy (AIRS). In NCS, the vehicles execute tasks generated by the applications in their own On-Board Units (OBU), while in AIRS vehicles provide all their CPU resources to help other vehicles offloading requests. Our simulation results show that our PIRS strategy can execute more tasks on the V2V layer and lead to fewer number of task (and their length) to be offloaded to the cloud, reaching up to 28% improvement compared to NCS and up to 10% improvement compared to AIRS.
Different Mechanisms of Machine Learning and Optimization Algorithms Utilized in Intrusion Detection Systems
Abstract
Malicious software is an integral part of cybercrime defense. Due to the growing number of malicious attacks and their target sources, detecting and preventing the attack becomes more challenging due to the assault's changing behavior. The bulk of classic malware detection systems is based on statistics, analytic techniques, or machine learning. Virus signature methods are widely used to identify malware. The bulk of anti-malware systems categorizes malware using regular expressions and patterns. While antivirus software is less likely to update its databases to identify and block malware, file features must be updated to detect and prevent newly generated malware. Creating attack signatures requires practically all of a human being's work. The purpose of this study is to undertake a review of the current research on intrusion detection models and the datasets that support them. In this article, we discuss the state-of-the-art, focusing on the strategy that was devised and executed, the dataset that was utilized, the findings, and the assessment that was undertaken. Additionally, the surveyed articles undergo critical analysis and statements in order to give a thorough comparative review. Machine learning and deep learning methods, as well as new classification and feature selection methodologies, are studied and researched. Thus far, each technique has proved the capability of constructing very accurate intrusion detection models. The survey findings reveal that Clearly, the MultiTree and adaptive voting algorithms surpassed all other models in terms of persistency and performance, averaging 99.98 percent accuracy on average.
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Authors: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.
Multi-Valued Connected Consensus: A New Perspective on Crusader Agreement and Adopt-Commit
Authors: Hagit Attiya, Jennifer L. Welch
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Algorithms to solve fault-tolerant consensus in asynchronous systems often rely on primitives such as crusader agreement, adopt-commit, and graded broadcast, which provide weaker agreement properties than consensus. Although these primitives have a similar flavor, they have been defined and implemented separately in ad hoc ways. We propose a new problem called connected consensus that has as special cases crusader agreement, adopt-commit, and graded broadcast, and generalizes them to handle multi-valued inputs. The generalization is accomplished by relating the problem to approximate agreement on graphs. We present three algorithms for multi-valued connected consensus in asynchronous message-passing systems, one tolerating crash failures and two tolerating malicious (unauthenticated Byzantine) failures. We extend the definition of binding, a desirable property recently identified as supporting binary consensus algorithms that are correct against adaptive adversaries, to the multi-valued input case and show that all our algorithms satisfy the property. Our crash-resilient algorithm has failure-resilience and time complexity that we show are optimal. When restricted to the case of binary inputs, the algorithm has improved time complexity over prior algorithms. Our two algorithms for malicious failures trade off failure resilience and time complexity. The first algorithm has time complexity that we prove is optimal but worse failure-resilience, while the second has failure-resilience that we prove is optimal but worse time complexity. When restricted to the case of binary inputs, the time complexity (as well as resilience) of the second algorithm matches that of prior algorithms.
Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising
Authors: Jun Cheng, Tao Liu, Shan Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Real-world single image denoising is crucial and practical in computer vision. Bayesian inversions combined with score priors now have proven effective for single image denoising but are limited to white Gaussian noise. Moreover, applying existing score-based methods for real-world denoising requires not only the explicit train of score priors on the target domain but also the careful design of sampling procedures for posterior inference, which is complicated and impractical. To address these limitations, we propose a score priors-guided deep variational inference, namely ScoreDVI, for practical real-world denoising. By considering the deep variational image posterior with a Gaussian form, score priors are extracted based on easily accessible minimum MSE Non-$i.i.d$ Gaussian denoisers and variational samples, which in turn facilitate optimizing the variational image posterior. Such a procedure adaptively applies cheap score priors to denoising. Additionally, we exploit a Non-$i.i.d$ Gaussian mixture model and variational noise posterior to model the real-world noise. This scheme also enables the pixel-wise fusion of multiple image priors and variational image posteriors. Besides, we develop a noise-aware prior assignment strategy that dynamically adjusts the weight of image priors in the optimization. Our method outperforms other single image-based real-world denoising methods and achieves comparable performance to dataset-based unsupervised methods.
Pareto Invariant Representation Learning for Multimedia Recommendation
Authors: Shanshan Huang, Haoxuan Li, Qingsong Li, Chunyuan Zheng, Li Liu
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology
Authors: Sean Wu, Michael Koo, Lesley Blum, Andy Black, Liyo Kao, Fabien Scalzo, Ira Kurtz
Abstract
In recent years, there have been significant breakthroughs in the field of natural language processing, particularly with the development of large language models (LLMs). These LLMs have showcased remarkable capabilities on various benchmarks. In the healthcare field, the exact role LLMs and other future AI models will play remains unclear. There is a potential for these models in the future to be used as part of adaptive physician training, medical co-pilot applications, and digital patient interaction scenarios. The ability of AI models to participate in medical training and patient care will depend in part on their mastery of the knowledge content of specific medical fields. This study investigated the medical knowledge capability of LLMs, specifically in the context of internal medicine subspecialty multiple-choice test-taking ability. We compared the performance of several open-source LLMs (Koala 7B, Falcon 7B, Stable-Vicuna 13B, and Orca Mini 13B), to GPT-4 and Claude 2 on multiple-choice questions in the field of Nephrology. Nephrology was chosen as an example of a particularly conceptually complex subspecialty field within internal medicine. The study was conducted to evaluate the ability of LLM models to provide correct answers to nephSAP (Nephrology Self-Assessment Program) multiple-choice questions. The overall success of open-sourced LLMs in answering the 858 nephSAP multiple-choice questions correctly was 17.1% - 25.5%. In contrast, Claude 2 answered 54.4% of the questions correctly, whereas GPT-4 achieved a score of 73.3%. We show that current widely used open-sourced LLMs do poorly in their ability for zero-shot reasoning when compared to GPT-4 and Claude 2. The findings of this study potentially have significant implications for the future of subspecialty medical training and patient care.
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
Abstract
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Abstract
Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN
Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
Abstract
The application of 3D ground-penetrating radar (3D-GPR) for subgrade distress detection has gained widespread popularity. To enhance the efficiency and accuracy of detection, pioneering studies have attempted to adopt automatic detection techniques, particularly deep learning. However, existing works typically rely on traditional 1D A-scan, 2D B-scan or 3D C-scan data of the GPR, resulting in either insufficient spatial information or high computational complexity. To address these challenges, we introduce a novel methodology for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data. Moreover, we construct a real multi-view image dataset derived from the original 3D-GPR data for the detection task, which provides richer spatial information compared to A-scan and B-scan data, while reducing computational complexity compared to C-scan data. Subsequently, we develop a novel \textbf{M}ulti-\textbf{V}iew \textbf{V}usion and \textbf{D}istillation framework, \textbf{GPR-MVFD}, specifically designed to optimally utilize the multi-view GPR dataset. This framework ingeniously incorporates multi-view distillation and attention-based fusion to facilitate significant feature extraction for subgrade distresses. In addition, a self-adaptive learning mechanism is adopted to stabilize the model training and prevent performance degeneration in each branch. Extensive experiments conducted on this new GPR benchmark demonstrate the effectiveness and efficiency of our proposed framework. Our framework outperforms not only the existing GPR baselines, but also the state-of-the-art methods in the fields of multi-view learning, multi-modal learning, and knowledge distillation. We will release the constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework.
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
Authors: Ye Tian, Mengyu Yang, Lanshan Zhang, Zhizhen Zhang, Yang Liu, Xiaohui Xie, Xirong Que, Wendong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent adaptive methods for efficient video recognition mostly follow the two-stage paradigm of "preview-then-recognition" and have achieved great success on multiple video benchmarks. However, this two-stage paradigm involves two visits of raw frames from coarse-grained to fine-grained during inference (cannot be parallelized), and the captured spatiotemporal features cannot be reused in the second stage (due to varying granularity), being not friendly to efficiency and computation optimization. To this end, inspired by human cognition, we propose a novel recognition paradigm of "View while Moving" for efficient long-untrimmed video recognition. In contrast to the two-stage paradigm, our paradigm only needs to access the raw frame once. The two phases of coarse-grained sampling and fine-grained recognition are combined into unified spatiotemporal modeling, showing great performance. Moreover, we investigate the properties of semantic units in video and propose a hierarchical mechanism to efficiently capture and reason about the unit-level and video-level temporal semantics in long-untrimmed videos respectively. Extensive experiments on both long-untrimmed and short-trimmed videos demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy as well as efficiency, yielding new efficiency and accuracy trade-offs for video spatiotemporal modeling.
Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching
Abstract
Recently, learning-based algorithms have achieved promising performance on cross-spectral image patch matching, which, however, is still far from satisfactory for practical application. On the one hand, a lack of large-scale dataset with diverse scenes haunts its further improvement for learning-based algorithms, whose performances and generalization rely heavily on the dataset size and diversity. On the other hand, more emphasis has been put on feature relation in the spatial domain whereas the scale dependency between features has often been ignored, leading to performance degeneration especially when encountering significant appearance variations for cross-spectral patches. To address these issues, we publish, to be best of our knowledge, the largest visible and Long-wave Infrared (LWIR) image patch matching dataset, termed VL-CMIM, which contains 1300 pairs of strictly aligned visible and LWIR images and over 2 million patch pairs covering diverse scenes such as asteroid, field, country, build, street and water.In addition, a multi-domain feature relation learning network (MD-FRN) is proposed. Input by the features extracted from a four-branch network, both feature relations in spatial and scale domains are learned via a spatial correlation module (SCM) and multi-scale adaptive aggregation module (MSAG), respectively. To further aggregate the multi-domain relations, a deep domain interactive mechanism (DIM) is applied, where the learnt spatial-relation and scale-relation features are exchanged and further input into MSCRM and SCM. This mechanism allows our model to learn interactive cross-domain feature relations, leading to improved robustness to significant appearance changes due to different modality.
Transmission and Color-guided Network for Underwater Image Enhancement
Authors: Pan Mu, Jing Fang, Haotian Qian, Cong Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
In recent years, with the continuous development of the marine industry, underwater image enhancement has attracted plenty of attention. Unfortunately, the propagation of light in water will be absorbed by water bodies and scattered by suspended particles, resulting in color deviation and low contrast. To solve these two problems, we propose an Adaptive Transmission and Dynamic Color guided network (named ATDCnet) for underwater image enhancement. In particular, to exploit the knowledge of physics, we design an Adaptive Transmission-directed Module (ATM) to better guide the network. To deal with the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Further, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to perform color restoration and contrast enhancement simultaneously. Extensive experiments demonstrate the state-of-the-art performance of the ATDCnet on multiple benchmark datasets.
Differentially Private Graph Neural Network with Importance-Grained Noise Adaption
Authors: Yuxin Qi, Xi Lin, Jun Wu
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Graph Neural Networks (GNNs) with differential privacy have been proposed to preserve graph privacy when nodes represent personal and sensitive information. However, the existing methods ignore that nodes with different importance may yield diverse privacy demands, which may lead to over-protect some nodes and decrease model utility. In this paper, we study the problem of importance-grained privacy, where nodes contain personal data that need to be kept private but are critical for training a GNN. We propose NAP-GNN, a node-importance-grained privacy-preserving GNN algorithm with privacy guarantees based on adaptive differential privacy to safeguard node information. First, we propose a Topology-based Node Importance Estimation (TNIE) method to infer unknown node importance with neighborhood and centrality awareness. Second, an adaptive private aggregation method is proposed to perturb neighborhood aggregation from node-importance-grain. Third, we propose to privately train a graph learning algorithm on perturbed aggregations in adaptive residual connection mode over multi-layers convolution for node-wise tasks. Theoretically analysis shows that NAP-GNN satisfies privacy guarantees. Empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.
Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution
Authors: Ao Li, Le Zhang, Yun Liu, Ce Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. Our experiments on multiple datasets demonstrate that CRAFT outperforms state-of-the-art methods by up to 0.29dB while using fewer parameters. The source code will be made available at: https://github.com/AVC2-UESTC/CRAFT-SR.git.
Keyword: quantization
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Authors: Pierre Champion
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.
Quantization Aware Factorization for Deep Neural Network Compression
Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak
Abstract
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep neural networks (DNNs) demonstrate outstanding performance across most computer vision tasks. Some critical applications, such as autonomous driving or medical imaging, also require investigation into their behavior and the reasons behind the decisions they make. In this vein, DNN attribution consists in studying the relationship between the predictions of a DNN and its inputs. Attribution methods have been adapted to highlight the most relevant weights or neurons in a DNN, allowing to more efficiently select which weights or neurons can be pruned. However, a limitation of these approaches is that weights are typically compared within each layer separately, while some layers might appear as more critical than others. In this work, we propose to investigate DNN layer importance, i.e. to estimate the sensitivity of the accuracy w.r.t. perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g. bit swaps).
Keyword: efficient
The Two Faces of AI in Green Mobile Computing: A Literature Review
Simulative Performance Analysis of an AD Function with Road Network Variation
Sustainable development-oriented campus bike-sharing site evaluation model: A case study of Henan Polytechnic University
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
YUDO: YOLO for Uniform Directed Object Detection
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
Optimizing Algorithms From Pairwise User Preferences
An Approach for Optimizing Acceleration in Connected and Automated Vehicles
Long-Distance Gesture Recognition using Dynamic Neural Networks
Communication-Efficient Search under Fully Homomorphic Encryption for Federated Machine Learning
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
TRTM: Template-based Reconstruction and Target-oriented Manipulation of Crumpled Cloths
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks
Maximizing Network Connectivity for UAV Communications via Reconfigurable Intelligent Surfaces
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
Finite Element Operator Network for Solving Parametric PDEs
A High-efficient Battery Charging System for Electric Vehicle
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data
E3-UAV: An Edge-based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles
A Novel Approach for Establishing Connectivity in Partitioned Mobile Sensor Networks Using Beamforming Techniques
Neuro-Symbolic RDF and Description Logic Reasoners: The State-Of-The-Art and Challenges
Strategic Interactions in Multi-modal Mobility Systems: A Game-Theoretic Perspective
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
Intrinsic Motivation via Surprise Memory
Why Data Science Projects Fail
Service Reservation and Pricing for Green Metaverses: A Stackelberg Game Approach
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
Gaussian Image Anomaly Detection with Greedy Eigencomponent Selection
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
Random-Walk Metaball-Imaging Discrete Element Lattice Boltzmann Method for 3D Solute Transport in Fluid-Particle Systems with Complex Granular Morphologies
Kairos: : Practical Intrusion Detection and Investigation using Whole-system Provenance
Neural Field Movement Primitives for Joint Modelling of Scenes and Motions
A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique
Prompting In-Context Operator Learning with Sensor Data, Equations, and Natural Language
CERMET: Coding for Energy Reduction with Multiple Encryption Techniques -- $It's\ easy\ being\ green$
Ergodic Capacity of Dyadic Fading Channels in Ultra Low-SNR Regime
Learning of discrete models of variational PDEs from data
DOST -- Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
Scene-Generalizable Interactive Segmentation of Radiance Fields
Keyword: faster
Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
Estimation of Human Condition at Disaster Site Using Aerial Drone Images
BOPIM: Bayesian Optimization for influence maximization on temporal networks
Keyword: mobile
The Two Faces of AI in Green Mobile Computing: A Literature Review
Resource Cooperation in MEC and SDN based Vehicular Networks
Quantization Aware Factorization for Deep Neural Network Compression
A Forensic Methodology for Detecting Image Manipulations
A High-efficient Battery Charging System for Electric Vehicle
Case Study: Using AI-Assisted Code Generation In Mobile Teams
A Novel Approach for Establishing Connectivity in Partitioned Mobile Sensor Networks Using Beamforming Techniques
Enhancing Mobile Privacy and Security: A Face Skin Patch-Based Anti-Spoofing Approach
Enhancement of Satellite-to-Phone Link Budget by Using Distributed Beamforming
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
can-train-and-test: A Curated CAN Dataset for Automotive Intrusion Detection
Ergodic Capacity of Dyadic Fading Channels in Ultra Low-SNR Regime
Keyword: pruning
D-Score: A Synapse-Inspired Approach for Filter Pruning
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Keyword: diffusion
3D Scene Diffusion Guidance using Scene Graphs
Instabilities of explicit finite difference schemes with ghost points on the diffusion equation
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Going Deeper with Five-point Stencil Convolutions for Reaction-Diffusion Equations
CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks
IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models
Do Diffusion Models Suffer Error Propagation? Theoretical Analysis and Consistency Regularization
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
Keyword: adaptive
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Resource Cooperation in MEC and SDN based Vehicular Networks
Different Mechanisms of Machine Learning and Optimization Algorithms Utilized in Intrusion Detection Systems
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Multi-Valued Connected Consensus: A New Perspective on Crusader Agreement and Adopt-Commit
Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising
Pareto Invariant Representation Learning for Multimedia Recommendation
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching
Transmission and Color-guided Network for Underwater Image Enhancement
Differentially Private Graph Neural Network with Importance-Grained Noise Adaption
Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution
Keyword: quantization
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques
Quantization Aware Factorization for Deep Neural Network Compression
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference