Abstract
We explore the use of Physics Informed Neural Networks to analyse nonlinear Hamiltonian Dynamical Systems with a first integral of motion. In this work, we propose an architecture which combines existing Hamiltonian Neural Network structures into Adaptable Symplectic Recurrent Neural Networks which preserve Hamilton's equations as well as the symplectic structure of phase space while predicting dynamics for the entire parameter space. This architecture is found to significantly outperform previously proposed neural networks when predicting Hamiltonian dynamics especially in potentials which contain multiple parameters. We demonstrate its robustness using the nonlinear Henon-Heiles potential under chaotic, quasiperiodic and periodic conditions. The second problem we tackle is whether we can use the high dimensional nonlinear capabilities of neural networks to predict the dynamics of a Hamiltonian system given only partial information of the same. Hence we attempt to take advantage of Long Short Term Memory networks to implement Takens' embedding theorem and construct a delay embedding of the system followed by mapping the topologically invariant attractor to the true form. This architecture is then layered with Adaptable Symplectic nets to allow for predictions which preserve the structure of Hamilton's equations. We show that this method works efficiently for single parameter potentials and provides accurate predictions even over long periods of time.
Improved Neural Radiance Fields Using Pseudo-depth and Fusion
Authors: Jingliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Since the advent of Neural Radiance Fields, novel view synthesis has received tremendous attention. The existing approach for the generalization of radiance field reconstruction primarily constructs an encoding volume from nearby source images as additional inputs. However, these approaches cannot efficiently encode the geometric information of real scenes with various scale objects/structures. In this work, we propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models. To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously. The predicted depth map will be used to supervise the rendered depth, narrow the depth range, and guide points sampling. Finally, the geometric information contained in point volume features may be inaccurate due to occlusion, lighting, etc. To this end, we propose enhancing the point volume feature from depth-guided neighbor feature fusion. Experiments demonstrate the superior performance of our method in both novel view synthesis and dense geometry modeling without per-scene optimization.
Goodness-of-Fit of Attributed Probabilistic Graph Generative Models
Authors: Pablo Robles-Granda, Katherine Tsai, Oluwasanmi Koyejo
Abstract
Probabilistic generative models of graphs are important tools that enable representation and sampling. Many recent works have created probabilistic models of graphs that are capable of representing not only entity interactions but also their attributes. However, given a generative model of random attributed graph(s), the general conditions that establish goodness of fit are not clear a-priori. In this paper, we define goodness of fit in terms of the mean square contingency coefficient for random binary networks. For this statistic, we outline a procedure for assessing the quality of the structure of a learned attributed graph by ensuring that the discrepancy of the mean square contingency coefficient (constant, or random) is minimal with high probability. We apply these criteria to verify the representation capability of a probabilistic generative model for various popular types of graph models.
Exploring IoT for real-time CO2 monitoring and analysis
Abstract
As a part of this project, we have developed an IoT-based instrument utilizing the NODE MCU-ESP8266 module, MQ135 gas sensor, and DHT-11 sensor for measuring CO$_2$ levels in parts per million (ppm), temperature, and humidity. The escalating CO$_2$ levels worldwide necessitate constant monitoring and analysis to comprehend the implications for human health, safety, energy efficiency, and environmental well-being. Thus, an efficient and cost-effective solution is imperative to measure and transmit data for statistical analysis and storage. The instrument offers real-time monitoring, enabling a comprehensive understanding of indoor environmental conditions. By providing valuable insights, it facilitates the implementation of measures to ensure health and safety, optimize energy efficiency, and promote effective environmental monitoring. This scientific endeavor aims to contribute to the growing body of knowledge surrounding CO$_2$ levels, temperature, and humidity, fostering sustainable practices and informed decision-making
Nucleotide String Indexing using Range Matching
Authors: Alon Rashelbach, Ori Rottensterich, Mark Silberstien
Subjects: Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN)
Abstract
The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both memory efficient and fast. We observe that nucleotide sequences can be represented as integer ranges and leverage a range-matching algorithm based on neural networks to perform the lookup. We prototype Ranger in software and integrate it into the popular Minimap2 tool. Ranger achieves almost identical end-to-end performance as the original Minimap2, while occupying 1.7$\times$ and 1.2$\times$ less memory for short- and long-reads, respectively. With a limited memory capacity, Ranger achieves up to 4.3$\times$ speedup for short reads compared to FM-Index, and up to 4.2$\times$ and 1.8$\times$ speedups for short- and long-reads, compared to hash-tables. Ranger opens up new opportunities in the context of hardware acceleration by reducing the memory footprint of long-seed indexes used in state-of-the-art alignment accelerators by up to 23$\times$ which results with 3$\times$ faster alignment and negligible accuracy degradation. Moreover, its worst case memory bandwidth and latency can be bounded in advance without the need to inflate DRAM capacity.
CECM: A continuous empirical cubature method with application to the dimensional hyperreduction of parameterized finite element models
Authors: J.A. Hernandez, J.R. Bravo, S. Ares de Parga
Abstract
We present the Continuous Empirical Cubature Method (CECM), a novel algorithm for empirically devising efficient integration rules. The CECM aims to improve existing cubature methods by producing rules that are close to the optimal, featuring far less points than the number of functions to integrate. The CECM consists on a two-stage strategy. First, a point selection strategy is applied for obtaining an initial approximation to the cubature rule, featuring as many points as functions to integrate. The second stage consists in a sparsification strategy in which, alongside the indexes and corresponding weights, the spatial coordinates of the points are also considered as design variables. The positions of the initially selected points are changed to render their associated weights to zero, and in this way, the minimum number of points is achieved. Although originally conceived within the framework of hyper-reduced order models (HROMs), we present the method's formulation in terms of generic vector-valued functions, thereby accentuating its versatility across various problem domains. To demonstrate the extensive applicability of the method, we conduct numerical validations using univariate and multivariate Lagrange polynomials. In these cases, we show the method's capacity to retrieve the optimal Gaussian rule. We also asses the method for an arbitrary exponential-sinusoidal function in a 3D domain, and finally consider an example of the application of the method to the hyperreduction of a multiscale finite element model, showcasing notable computational performance gains. A secondary contribution of the current paper is the Sequential Randomized SVD (SRSVD) approach for computing the Singular Value Decomposition (SVD) in a column-partitioned format. The SRSVD is particularly advantageous when matrix sizes approach memory limitations.
A staggered-in-time and non-conforming-in-space numerical framework for realistic cardiac electrophysiology outputs
Authors: Elena Zappon, Andrea Manzoni, Alfio Quarteroni
Abstract
Computer-based simulations of non-invasive cardiac electrical outputs, such as electrocardiograms and body surface potential maps, usually entail severe computational costs due to the need of capturing fine-scale processes and to the complexity of the heart-torso morphology. In this work, we model cardiac electrical outputs by employing a coupled model consisting of a reaction-diffusion model - either the bidomain model or the most efficient pseudo-bidomain model - on the heart, and an elliptic model in the torso. We then solve the coupled problem with a segregated and staggered in-time numerical scheme, that allows for independent and infrequent solution in the torso region. To further reduce the computational load, main novelty of this work is in introduction of an interpolation method at the interface between the heart and torso domains, enabling the use of non-conforming meshes, and the numerical framework application to realistic cardiac and torso geometries. The reliability and efficiency of the proposed scheme is tested against the corresponding state-of-the-art bidomain-torso model. Furthermore, we explore the impact of torso spatial discretization and geometrical non-conformity on the model solution and the corresponding clinical outputs. The investigation of the interface interpolation method provides insights into the influence of torso spatial discretization and of the geometrical non-conformity on the simulation results and their clinical relevance.
New Bounds on Quotient Polynomials with Applications to Exact Divisibility and Divisibility Testing of Sparse Polynomials
Authors: Ido Nahshon, Amir Shpilka
Subjects: Symbolic Computation (cs.SC); Computational Complexity (cs.CC); Number Theory (math.NT)
Abstract
A sparse polynomial (also called a lacunary polynomial) is a polynomial that has relatively few terms compared to its degree. The sparse-representation of a polynomial represents the polynomial as a list of its non-zero terms (coefficient-degree pairs). In particular, the degree of a sparse polynomial can be exponential in the sparse-representation size. We prove that for monic polynomials $f, g \in \mathbb{C}[x]$ such that $g$ divides $f$, the $\ell_2$-norm of the quotient polynomial $f/g$ is bounded by $\lVert f \rVert_1 \cdot \tilde{O}(\lVert{g}\rVert_0^3\text{deg}^2{ f})^{\lVert{g}\rVert_0 - 1}$. This improves upon the exponential (in $\text{deg}{ f}$) bounds for general polynomials and implies that the trivial long division algorithm runs in time quasi-linear in the input size and number of terms of the quotient polynomial $f/g$, thus solving a long-standing problem on exact divisibility of sparse polynomials. We also study the problem of bounding the number of terms of $f/g$ in some special cases. When $f, g \in \mathbb{Z}[x]$ and $g$ is a cyclotomic-free (i.e., it has no cyclotomic factors) trinomial, we prove that $\lVert{f/g}\rVert_0 \leq O(\lVert{f}\rVert_0 \text{size}({f})^2 \cdot \log^6{\text{deg}{ g}})$. When $g$ is a binomial with $g(\pm 1) \neq 0$, we prove that the sparsity is at most $O(\lVert{f}\rVert_0 ( \log{\lVert{f}\rVert0} + \log{\lVert{f}\rVert{\infty}}))$. Both upper bounds are polynomial in the input-size. We leverage these results and give a polynomial time algorithm for deciding whether a cyclotomic-free trinomial divides a sparse polynomial over the integers. As our last result, we present a polynomial time algorithm for testing divisibility by pentanomials over small finite fields when $\text{deg}{ f} = \tilde{O}(\text{deg}{ g})$.
ForensiBlock: A Provenance-Driven Blockchain Framework for Data Forensics and Auditability
Abstract
Maintaining accurate provenance records is paramount in digital forensics, as they underpin evidence credibility and integrity, addressing essential aspects like accountability and reproducibility. Blockchains have several properties that can address these requirements. Previous systems utilized public blockchains, i.e., treated blockchain as a black box, and benefiting from the immutability property. However, the blockchain was accessible to everyone, giving rise to security concerns and moreover, efficient extraction of provenance faces challenges due to the enormous scale and complexity of digital data. This necessitates a tailored blockchain design for digital forensics. Our solution, Forensiblock has a novel design that automates investigation steps, ensures secure data access, traces data origins, preserves records, and expedites provenance extraction. Forensiblock incorporates Role-Based Access Control with Staged Authorization (RBAC-SA) and a distributed Merkle root for case tracking. These features support authorized resource access with an efficient retrieval of provenance records. Particularly, comparing two methods for extracting provenance records off chain storage retrieval with Merkle root verification and a brute-force search the offchain method is significantly better, especially as the blockchain size and number of cases increase. We also found that our distributed Merkle root creation slightly increases smart contract processing time but significantly improves history access. Overall, we show that Forensiblock offers secure, efficient, and reliable handling of digital forensic data
Optimizing the switching operation in monoclonal antibody production: Economic MPC and reinforcement learning
Authors: Sandra A. Obiri, Song Bo, Bernard T. Agyeman, Benjamin Decardi-Nelson, Jinfeng Liu (University of Alberta)
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Quantitative Methods (q-bio.QM)
Abstract
Monoclonal antibodies (mAbs) have emerged as indispensable assets in medicine, and are currently at the forefront of biopharmaceutical product development. However, the growing market demand and the substantial doses required for mAb clinical treatments necessitate significant progress in its large-scale production. Most of the processes for industrial mAb production rely on batch operations, which result in significant downtime. The shift towards a fully continuous and integrated manufacturing process holds the potential to boost product yield and quality, while eliminating the extra expenses associated with storing intermediate products. The integrated continuous mAb production process can be divided into the upstream and downstream processes. One crucial aspect that ensures the continuity of the integrated process is the switching of the capture columns, which are typically chromatography columns operated in a fed-batch manner downstream. Due to the discrete nature of the switching operation, advanced process control algorithms such as economic MPC (EMPC) are computationally difficult to implement. This is because an integer nonlinear program (INLP) needs to be solved online at each sampling time. This paper introduces two computationally-efficient approaches for EMPC implementation, namely, a sigmoid function approximation approach and a rectified linear unit (ReLU) approximation approach. It also explores the application of deep reinforcement learning (DRL). These three methods are compared to the traditional switching approach which is based on a 1% product breakthrough rule and which involves no optimization.
Deterministic Neural Illumination Mapping for Efficient Auto-White Balance Correction
Authors: Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Auto-white balance (AWB) correction is a critical operation in image signal processors for accurate and consistent color correction across various illumination scenarios. This paper presents a novel and efficient AWB correction method that achieves at least 35 times faster processing with equivalent or superior performance on high-resolution images for the current state-of-the-art methods. Inspired by deterministic color style transfer, our approach introduces deterministic illumination color mapping, leveraging learnable projection matrices for both canonical illumination form and AWB-corrected output. It involves feeding high-resolution images and corresponding latent representations into a mapping module to derive a canonical form, followed by another mapping module that maps the pixel values to those for the corrected version. This strategy is designed as resolution-agnostic and also enables seamless integration of any pre-trained AWB network as the backbone. Experimental results confirm the effectiveness of our approach, revealing significant performance improvements and reduced time complexity compared to state-of-the-art methods. Our method provides an efficient deep learning-based AWB correction solution, promising real-time, high-quality color correction for digital imaging applications. Source code is available at https://github.com/birdortyedi/DeNIM/
A Benchmarking Study of Matching Algorithms for Knowledge Graph Entity Alignment
Authors: Nhat-Minh Dao, Thai V. Hoang, Zonghua Zhang
Abstract
How to identify those equivalent entities between knowledge graphs (KGs), which is called Entity Alignment (EA), is a long-standing challenge. So far, many methods have been proposed, with recent focus on leveraging Deep Learning to solve this problem. However, we observe that most of the efforts has been paid to having better representation of entities, rather than improving entity matching from the learned representations. In fact, how to efficiently infer the entity pairs from this similarity matrix, which is essentially a matching problem, has been largely ignored by the community. Motivated by this observation, we conduct an in-depth analysis on existing algorithms that are particularly designed for solving this matching problem, and propose a novel matching method, named Bidirectional Matching (BMat). Our extensive experimental results on public datasets indicate that there is currently no single silver bullet solution for EA. In other words, different classes of entity similarity estimation may require different matching algorithms to reach the best EA results for each class. We finally conclude that using PARIS, the state-of-the-art EA approach, with BMat gives the best combination in terms of EA performance and the algorithm's time and space complexity.
An Approximate Dynamic Programming Approach to Vehicle Platooning Coordination in Networks
Authors: Xi Xiong, Maonan Wang, Dengfeng Sun, Li Jin
Abstract
Platooning connected and autonomous vehicles (CAVs) provide significant benefits in terms of traffic efficiency and fuel economy. However, most existing platooning systems assume the availability of pre-determined plans, which is not feasible in real-time scenarios. In this paper, we address this issue in time-dependent networks by formulating a Markov decision process at each junction, aiming to minimize travel time and fuel consumption. Initially, we analyze coordinated platooning without routing to explore the cooperation among controllers on an identical path. We propose two novel approaches based on approximate dynamic programming, offering suboptimal control in the context of a stochastic finite horizon problem. The results demonstrate the superiority of the approximation in the policy space. Furthermore, we investigate platooning in a network setting, where speed profiles and routes are determined simultaneously. To simplify the problem, we decouple the action space by prioritizing routing decisions based on travel time estimation. We subsequently employ the aforementioned policy approximation to determine speed profiles, considering essential parameters such as travel times. Our simulation results in SUMO indicate that our method yields better performance than conventional approaches, leading to potential travel cost savings of up to 40%. Additionally, we evaluate the resilience of our approach in dynamically changing networks, affirming its ability to maintain efficient platooning operations.
CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification
Authors: Dongkyun Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Medical image classification poses unique challenges due to the long-tailed distribution of diseases, the co-occurrence of diagnostic findings, and the multiple views available for each study or patient. This paper introduces our solution to the ICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays. Our approach introduces CheXFusion, a transformer-based fusion module incorporating multi-view images. The fusion module, guided by self-attention and cross-attention mechanisms, efficiently aggregates multi-view features while considering label co-occurrence. Furthermore, we explore data balancing and self-training methods to optimize the model's performance. Our solution achieves state-of-the-art results with 0.372 mAP in the MIMIC-CXR test set, securing 1st place in the competition. Our success in the task underscores the significance of considering multi-view settings, class imbalance, and label co-occurrence in medical image classification. Public code is available at https://github.com/dongkyuk/CXR-LT-public-solution
Optimal partitioning of directed acyclic graphs with dependent costs between clusters
Authors: Paul Pao-Yen Wu, Fabrizio Rggeri, Kerrie Mengersen
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Abstract
Many statistical inference contexts, including Bayesian Networks (BNs), Markov processes and Hidden Markov Models (HMMS) could be supported by partitioning (i.e.~mapping) the underlying Directed Acyclic Graph (DAG) into clusters. However, optimal partitioning is challenging, especially in statistical inference as the cost to be optimised is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG and cluster mappings, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a DBN model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was $9.91\times 10^9$ and $1.51\times10^{21}$ possible cluster mappings, respectively, but near-optimal solutions with 88\% and 72\% similarity to the optimal solution were found at iterations 170 and 865, respectively. The first optimal solution was found at iteration 934 $(\text{95\% CI } 926,971)$, and 2256 $(2150,2271)$ with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively.
Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures
Authors: Mohamed Assem Ibrahim, Shaizeen Aga
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT. To this end, we first deduce a mapping of FFT computation to a strawman PIM architecture representative of recent commercial designs. We observe that even with careful data mapping, PIM is not effective in accelerating FFT. To address this, we make a case for collaborative acceleration of FFT with PIM and GPU. Further, we propose software and hardware innovations which lower PIM operations necessary for a given FFT. Overall, our optimized PIM FFT mapping, termed Pimacolaba, delivers performance and data movement savings of up to 1.38$\times$ and 2.76$\times$, respectively, over a range of FFT sizes.
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool
Abstract
Large Language Model (LLM) based Generative AI systems have seen significant progress in recent years. Integrating a knowledge retrieval architecture allows for seamless integration of private data into publicly available Generative AI systems using pre-trained LLM without requiring additional model fine-tuning. Moreover, Retrieval-Centric Generation (RCG) approach, a promising future research direction that explicitly separates roles of LLMs and retrievers in context interpretation and knowledge memorization, potentially leads to more efficient implementation. SimplyRetrieve is an open-source tool with the goal of providing a localized, lightweight, and user-friendly interface to these sophisticated advancements to the machine learning community. SimplyRetrieve features a GUI and API based RCG platform, assisted by a Private Knowledge Base Constructor and a Retrieval Tuning Module. By leveraging these capabilities, users can explore the potential of RCG for improving generative AI performance while maintaining privacy standards. The tool is available at https://github.com/RCGAI/SimplyRetrieve with an MIT license.
Reasonable mechanical model on shallow tunnel excavation to eliminate displacement singularity caused by unbalanced resultant
Abstract
When considering initial stress field in geomaterial, nonzero resultant of shallow tunnel excavation exists, which produces logarithmic items in complex potentials, and would further lead to a unique displacement singularity at infinity to violate geo-engineering fact in real world. The mechanical and mathematical reasons of such a unique displacement singularity in the existing mechanical models are elaborated, and a new mechanical model is subsequently proposed to eliminate this singularity by constraining far-field ground surface displacement, and the original unbalanced resultant problem is converted into an equilibrium one with mixed boundary conditions. To solve stress and displacement in the new model, the analytic continuation is applied to transform the mixed boundary conditions into a homogenerous Riemann-Hilbert problem with extra constraints, which is then solved using an approximate and iterative method with good numerical stability. The Lanczos filtering is applied to the stress and displacement solution to reduce the Gibbs phenomena caused by abrupt change of the boundary conditions along ground surface. Several numerical cases are conducted to verify the proposed mechanical model and the results strongly validate that the proposed mechanical model successfully eliminates the displacement singularity caused by unbalanced resultant with good convergence and accuracy to obtain stress and displacement for shallow tunnel excavation. A parametric investigation is subsequently conducted to study the influence of tunnel depth, lateral coefficient, and free surface range on stress and displacement distribution in geomaterial.
Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks
Abstract
The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.
Explicit Topology Optimization of Conforming Voronoi Foams
Authors: Ming Li, Jingqiao Hu, Wei Chen, Weipeng Kong, Jin Huang
Abstract
Topology optimization is able to maximally leverage the high DOFs and mechanical potentiality of porous foams but faces three fundamental challenges: conforming to free-form outer shapes, maintaining geometric connectivity between adjacent cells, and achieving high simulation accuracy. To resolve the issues, borrowing the concept from Voronoi tessellation, we propose to use the site (or seed) positions and radii of the beams as the DOFs for open-cell foam design. Such DOFs cover extensive design space and have clear geometrical meaning, which makes it easy to provide explicit controls (e.g. granularity). During the gradient-based optimization, the foam topology can change freely, and some seeds may even be pushed out of the shape, which greatly alleviates the challenges of prescribing a fixed underlying grid. The mechanical property of our foam is computed from its highly heterogeneous density field counterpart discretized on a background mesh, with a much improved accuracy via a new material-aware numerical coarsening method. We also explore the differentiability of the open-cell Voronoi foams w.r.t. its seed locations, and propose a local finite difference method to estimate the derivatives efficiently. We do not only show the improved foam performance of our Voronoi foam in comparison with classical topology optimization approaches, but also demonstrate its advantages in various settings, especially when the target volume fraction is extremely low.
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
Authors: Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image retrieval targets to find images from a database that are visually similar to the query image. Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. To better trade-off retrieval efficiency and accuracy, some approaches fuse global and local feature into a joint representation to perform single-stage image retrieval. However, they are still challenging due to various situations to tackle, $e.g.$, background, occlusion and viewpoint. In this work, we design a Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD) for end-to-end single-stage image retrieval-requiring only image-level labels. Specifically, we first design a novel adaptive softmax-based loss which dynamically tunes its scale and margin within each mini-batch and increases them progressively to strengthen supervision during training and intra-class compactness. Furthermore, we propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation by a hard negative sampling strategy to optimize inter-class distinctiveness at a global scale. Extensive experimental results have demonstrated the effectiveness of our method, which achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris. Code is available at https://github.com/bassyess/CFCD.
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Authors: Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
Top K Relevant Passage Retrieval for Biomedical Question Answering
Authors: Shashank Gupta
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
Question answering is a task that answers factoid questions using a large collection of documents. It aims to provide precise answers in response to the user's questions in natural language. Question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. Question answering (QA) has made big strides with several open-domain and machine comprehension systems built using large-scale annotated datasets. However, in the clinical domain, this problem remains relatively unexplored. According to multiple surveys, Biomedical Questions cannot be answered correctly from Wikipedia Articles. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions. When evaluated on a BioASQ QA dataset, our fine-tuned dense retriever results in a 0.81 F1 score.
An Empirical Analysis of Range for 3D Object Detection
Authors: Neehar Peri, Mengtian Li, Benjamin Wilson, Yu-Xiong Wang, James Hays, Deva Ramanan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
LiDAR-based 3D detection plays a vital role in autonomous navigation. Surprisingly, although autonomous vehicles (AVs) must detect both near-field objects (for collision avoidance) and far-field objects (for longer-term planning), contemporary benchmarks focus only on near-field 3D detection. However, AVs must detect far-field objects for safe navigation. In this paper, we present an empirical analysis of far-field 3D detection using the long-range detection dataset Argoverse 2.0 to better understand the problem, and share the following insight: near-field LiDAR measurements are dense and optimally encoded by small voxels, while far-field measurements are sparse and are better encoded with large voxels. We exploit this observation to build a collection of range experts tuned for near-vs-far field detection, and propose simple techniques to efficiently ensemble models for long-range detection that improve efficiency by 33% and boost accuracy by 3.2% CDS.
Learning Specialized Activation Functions for Physics-informed Neural Networks
Authors: Honghui Wang, Lu Lu, Shiji Song, Gao Huang
Abstract
Physics-informed neural networks (PINNs) are known to suffer from optimization difficulty. In this work, we reveal the connection between the optimization difficulty of PINNs and activation functions. Specifically, we show that PINNs exhibit high sensitivity to activation functions when solving PDEs with distinct properties. Existing works usually choose activation functions by inefficient trial-and-error. To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems. We compare different adaptive activation functions and discuss their limitations in the context of PINNs. Furthermore, we propose to tailor the idea of learning combinations of candidate activation functions to the PINNs optimization, which has a higher requirement for the smoothness and diversity on learned functions. This is achieved by removing activation functions which cannot provide higher-order derivatives from the candidate set and incorporating elementary functions with different properties according to our prior knowledge about the PDE at hand. We further enhance the search space with adaptive slopes. The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way. Its effectiveness is demonstrated on a series of benchmarks. Code is available at https://github.com/LeapLabTHU/AdaAFforPINNs.
Boundary-preserving Lamperti-splitting scheme for some Stochastic Differential Equations
Authors: Johan Ulander (Chalmers University of Technology)
Abstract
We propose and analyse an explicit boundary-preserving scheme for the strong approximations of some SDEs with non-globally Lipschitz drift and diffusion coefficients whose state-space is bounded. The scheme consists of a Lamperti transform followed by a Lie--Trotter splitting. We prove $L^{p}(\Omega)$-convergence of order $1$, for every $p \in \mathbb{N}$, of the scheme and exploit the Lamperti transform to confine the numerical approximations to the state-space of the considered SDE. We provide numerical experiments that confirm the theoretical results and compare the proposed Lamperti-splitting scheme to other numerical schemes for SDEs.
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Abstract
Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.
Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning
Authors: Lorenzo S. Querol, Macario O. Cordel II, Dan Jeric A. Rustia, Mary Nia M. Santos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The aquaculture industry, strongly reliant on shrimp exports, faces challenges due to viral infections like the White Spot Syndrome Virus (WSSV) that severely impact output yields. In this context, computer vision can play a significant role in identifying features not immediately evident to skilled or untrained eyes, potentially reducing the time required to report WSSV infections. In this study, the challenge of limited data for WSSV recognition was addressed. A mobile application dedicated to data collection and monitoring was developed to facilitate the creation of an image dataset to train a WSSV recognition model and improve country-wide disease surveillance. The study also includes a thorough analysis of WSSV recognition to address the challenge of imbalanced learning and on-device inference. The models explored, MobileNetV3-Small and EfficientNetV2-B0, gained an F1-Score of 0.72 and 0.99 respectively. The saliency heatmaps of both models were also observed to uncover the "black-box" nature of these models and to gain insight as to what features in the images are most important in making a prediction. These results highlight the effectiveness and limitations of using models designed for resource-constrained devices and balancing their performance in accurately recognizing WSSV, providing valuable information and direction in the use of computer vision in this domain.
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
Authors: Huilin Zhang, Sumei Li, Yongli Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Abstract
Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.
S&Reg: End-to-End Learning-Based Model for Multi-Goal Path Planning Problem
Abstract
In this paper, we propose a novel end-to-end approach for solving the multi-goal path planning problem in obstacle environments. Our proposed model, called S&Reg, integrates multi-task learning networks with a TSP solver and a path planner to quickly compute a closed and feasible path visiting all goals. Specifically, the model first predicts promising regions that potentially contain the optimal paths connecting two goals as a segmentation task. Simultaneously, estimations for pairwise distances between goals are conducted as a regression task by the neural networks, while the results construct a symmetric weight matrix for the TSP solver. Leveraging the TSP result, the path planner efficiently explores feasible paths guided by promising regions. We extensively evaluate the S&Reg model through simulations and compare it with the other sampling-based algorithms. The results demonstrate that our proposed model achieves superior performance in respect of computation time and solution cost, making it an effective solution for multi-goal path planning in obstacle environments. The proposed approach has the potential to be extended to other sampling-based algorithms for multi-goal path planning.
EFaR 2023: Efficient Face Recognition Competition
Authors: Jan Niklas Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, Naser Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, Sajid Javed, Naoufel Werghi, Klemen Grm, Vitomir Štruc, Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Josef Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, Sébastien Marcel, Iurii Medvedev, Bo Jin, Diogo Nunes, Ahmad Hassanpour, Pankaj Khatiwada, Aafan Ahmad Toor, Bian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.
Core interface optimization for multi-core neuromorphic processors
Authors: Zhe Su, Hyunjung Hwang, Tristan Torchet, Giacomo Indiveri
Subjects: Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)
Abstract
Hardware implementations of Spiking Neural Networks (SNNs) represent a promising approach to edge-computing for applications that require low-power and low-latency, and which cannot resort to external cloud-based computing services. However, most solutions proposed so far either support only relatively small networks, or take up significant hardware resources, to implement large networks. To realize large-scale and scalable SNNs it is necessary to develop an efficient asynchronous communication and routing fabric that enables the design of multi-core architectures. In particular the core interface that manages inter-core spike communication is a crucial component as it represents the bottleneck of Power-Performance-Area (PPA) especially for the arbitration architecture and the routing memory. In this paper we present an arbitration mechanism with the corresponding asynchronous encoding pipeline circuits, based on hierarchical arbiter trees. The proposed scheme reduces the latency by more than 70% in sparse-event mode, compared to the state-of-the-art arbitration architectures, with lower area cost. The routing memory makes use of asynchronous Content Addressable Memory (CAM) with Current Sensing Completion Detection (CSCD), which saves approximately 46% energy, and achieves a 40% increase in throughput against conventional asynchronous CAM using configurable delay lines, at the cost of only a slight increase in area. In addition as it radically reduces the core interface resources in multi-core neuromorphic processors, the arbitration architecture and CAM architecture we propose can be also applied to a wide range of general asynchronous circuits and systems.
Communication-Efficient Cooperative Multi-Agent PPO via Regulated Segment Mixture in Internet of Vehicles
Abstract
Multi-Agent Reinforcement Learning (MARL) has become a classic paradigm to solve diverse, intelligent control tasks like autonomous driving in Internet of Vehicles (IoV). However, the widely assumed existence of a central node to implement centralized federated learning-assisted MARL might be impractical in highly dynamic scenarios, and the excessive communication overheads possibly overwhelm the IoV system. Therefore, in this paper, we design a communication efficient cooperative MARL algorithm, named RSM-MAPPO, to reduce the communication overheads in a fully distributed architecture. In particular, RSM-MAPPO enhances the multi-agent Proximal Policy Optimization (PPO) by incorporating the idea of segment mixture and augmenting multiple model replicas from received neighboring policy segments. Afterwards, RSM-MAPPO adopts a theory-guided metric to regulate the selection of contributive replicas to guarantee the policy improvement. Finally, extensive simulations in a mixed-autonomy traffic control scenario verify the effectiveness of the RSM-MAPPO algorithm.
Robust retrieval of material chemical states in X-ray microspectroscopy
Authors: Ting Wang, Xiaotong Wu, Jizhou Li, Chao Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
X-ray microspectroscopic techniques are essential for studying morphological and chemical changes in materials, providing high-resolution structural and spectroscopic information. However, its practical data analysis for reliably retrieving the chemical states remains a major obstacle to accelerating the fundamental understanding of materials in many research fields. In this work, we propose a novel data formulation model for X-ray microspectroscopy and develop a dedicated unmixing framework to solve this problem, which is robust to noise and spectral variability. Moreover, this framework is not limited to the analysis of two-state material chemistry, making it an effective alternative to conventional and widely-used methods. In addition, an alternative directional multiplier method with provable convergence is applied to obtain the solution efficiently. Our framework can accurately identify and characterize chemical states in complex and heterogeneous samples, even under challenging conditions such as low signal-to-noise ratios and overlapping spectral features. Extensive experimental results on simulated and real datasets demonstrate its effectiveness and reliability.
A Differential Datalog Interpreter
Authors: Bruno Rucy Carneiro Alves de Lima, Merlin Kramer, Kalmer Apinis
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Abstract
The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations.
AquaSAM: Underwater Image Foreground Segmentation
Authors: Muduo Xu, Jianhao Su, Yutao Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The Segment Anything Model (SAM) has revolutionized natural image segmentation, nevertheless, its performance on underwater images is still restricted. This work presents AquaSAM, the first attempt to extend the success of SAM on underwater images with the purpose of creating a versatile method for the segmentation of various underwater targets. To achieve this, we begin by classifying and extracting various labels automatically in SUIM dataset. Subsequently, we develop a straightforward fine-tuning method to adapt SAM to general foreground underwater image segmentation. Through extensive experiments involving eight segmentation tasks like human divers, we demonstrate that AquaSAM outperforms the default SAM model especially at hard tasks like coral reefs. AquaSAM achieves an average Dice Similarity Coefficient (DSC) of 7.13 (%) improvement and an average of 8.27 (%) on mIoU improvement in underwater segmentation tasks.
Flexible and rigorous numerical modelling of multiphysics processes in fractured porous media using PorePy
Authors: Ivar Stefansson, Jhabriel Varela, Eirik Keilegavlen, Inga Berre
Abstract
Multiphysics processes in fractured porous media is a research field of importance for several subsurface applications and has received considerable attention over the last decade. The dynamics are characterised by strong couplings between processes as well as interaction between the processes and the structure of the fractured medium itself. The rich range of behavior calls for explorative mathematical modelling, such as experimentation with constitutive laws and novel coupling concepts between physical processes. Moreover, efficient simulations of the strong couplings between multiphysics processes and geological structures require the development of tailored numerical methods. We present a modelling framework and its implementation in the open-source simulation toolbox PorePy, which is designed for rapid prototyping of multiphysics processes in fractured porous media. PorePy uses a mixed-dimensional representation of the fracture geometry and generally applies fully implicit couplings between processes. The code design follows the paradigms of modularity and differentiable programming, which together allow for extreme flexibility in experimentation with governing equations with minimal changes to the code base. The code integrity is supported by a multilevel testing framework ensuring the reliability of the code. We present our modelling framework within a context of thermo-poroelasticity in deformable fractured porous media, illustrating the close relation between the governing equations and the source code. We furthermore discuss the design of the testing framework and present simulations showcasing the extendibility of PorePy, as well as the type of results that can be produced by mixed-dimensional simulation tools.
CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages
Abstract
We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of the South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza, and give a detailed description of the model training process for the latest 2.1 release of the pipeline. We also report performance scores produced by the pipeline for different languages and varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms or expands its parent pipeline Stanza at all the supported tasks. We also present the pipeline's new functionality enabling efficient processing of web data and the reasons that led to its implementation.
Novel Area-Efficient and Flexible Architectures for Optimal Ate Pairing on FPGA
Abstract
While FPGA is a suitable platform for implementing cryptographic algorithms, there are several challenges associated with implementing Optimal Ate pairing on FPGA, such as security, limited computing resources, and high power consumption. To overcome these issues, this study introduces three approaches that can execute the optimal Ate pairing on Barreto-Naehrig curves using Jacobean coordinates with the goal of reaching 128-bit security on the Genesys board. The first approach is a pure software implementation utilizing the MicroBlaze processor. The second involves a combination of software and hardware, with key operations in $F{p}$ and $F{p^{2}}$ being transformed into IP cores for the MicroBlaze. The third approach builds on the second by incorporating parallelism to improve the pairing process. The utilization of multiple MicroBlaze processors within a single system offers both versatility and parallelism to speed up pairing calculations. A variety of methods and parameters are used to optimize the pairing computation, including Montgomery modular multiplication, the Karatsuba method, Jacobean coordinates, the Complex squaring method, sparse multiplication, squaring in $G{\phi 6}F{p^{12}}$, and the addition chain method. The proposed systems are designed to efficiently utilize limited resources in restricted environments, while still completing tasks in a timely manner.
BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning
Abstract
This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.
Lossy and Lossless (L$^2$) Post-training Model Size Compression
Abstract
Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high compression ratios efficiently. This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. We first propose a unified parametric weight transformation, which ensures different lossy compression methods can be performed jointly in a post-training manner. Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression. Additionally, our method can easily control a desired global compression ratio and allocate adaptive ratios for different layers. Finally, our method can achieve a stable $10\times$ compression ratio without sacrificing accuracy and a $20\times$ compression ratio with minor accuracy loss in a short time. Our code is available at https://github.com/ModelTC/L2_Compression .
Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos
Abstract
Person search has recently been a challenging task in the computer vision domain, which aims to search specific pedestrians from real cameras.Nevertheless, most surveillance videos comprise only a handful of images of each pedestrian, which often feature identical backgrounds and clothing. Hence, it is difficult to learn more discriminative features for person search in real scenes. To tackle this challenge, we draw on Generative Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has thrived in computer vision problems because it produces high-quality images efficiently. We merely alter the popular Fast R-CNN model, which is capable of processing videos and yielding accurate detection outcomes. In order to appropriately relieve the pressure brought by the two-stage model, we design an Assisted-Identity Query Module (AIDQ) to provide positive images for the behind part. Besides, the proposed novel GAN-based Scene Synthesis model that can synthesize high-quality cross-id person images for person search tasks. In order to facilitate the feature learning of the GAN-based Scene Synthesis model, we adopt an online learning strategy that collaboratively learns the synthesized images and original images. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method has achieved great performance, and the extensive ablation study further justifies our GAN-synthetic data can effectively increase the variability of the datasets and be more realistic.
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Authors: Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
Abstract
Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination
Authors: Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM)
Abstract
Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, which may not be true negatives. In other words, the samples with high similarity but not paired with the anchor may reserve positive semantic associations, and we call them false negatives. Repelling these false negatives in triplet loss would mislead the semantic representation learning and result in inferior retrieval performance. In this paper, we propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling, which could alleviate the problem introduced by false negatives. Specifically, we first construct the distributions of positive and negative samples separately via their similarities with the anchor, based on the features extracted from image and text encoders. Then we calculate the false negative probability of a given sample based on its similarity with the anchor and the above distributions via the Bayes' rule, which is employed as the sampling weight during negative sampling process. Since there may not exist any false negative in a small batch size, we design a memory module with momentum to retain a large negative buffer and implement our negative sampling strategy spanning over the buffer. In addition, to make the model focus on hard negatives, we reassign the sampling weights for the simple negatives with a cut-down strategy. The extensive experiments are conducted on Flickr30K and MS-COCO, and the results demonstrate the superiority of our proposed false negative elimination strategy. The code is available at https://github.com/LuminosityX/FNE.
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
Keyword: faster
Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization
Authors: Shengyue Yao, Jingru Yu, Yi Yu, Jia Xu, Xingyuan Dai, Honghai Li, Fei-Yue Wang, Yilun Lin
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
With a growing complexity of the intelligent traffic system (ITS), an integrated control of ITS that is capable of considering plentiful heterogeneous intelligent agents is desired. However, existing control methods based on the centralized or the decentralized scheme have not presented their competencies in considering the optimality and the scalability simultaneously. To address this issue, we propose an integrated control method based on the framework of Decentralized Autonomous Organization (DAO). The proposed method achieves a global consensus on energy consumption efficiency (ECE), meanwhile to optimize the local objectives of all involved intelligent agents, through a consensus and incentive mechanism. Furthermore, an operation algorithm is proposed regarding the issue of structural rigidity in DAO. Specifically, the proposed operation approach identifies critical agents to execute the smart contract in DAO, which ultimately extends the capability of DAO-based control. In addition, a numerical experiment is designed to examine the performance of the proposed method. The experiment results indicate that the controlled agents can achieve a consensus faster on the global objective with improved local objectives by the proposed method, compare to existing decentralized control methods. In general, the proposed method shows a great potential in developing an integrated control system in the ITS
Nucleotide String Indexing using Range Matching
Authors: Alon Rashelbach, Ori Rottensterich, Mark Silberstien
Subjects: Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN)
Abstract
The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both memory efficient and fast. We observe that nucleotide sequences can be represented as integer ranges and leverage a range-matching algorithm based on neural networks to perform the lookup. We prototype Ranger in software and integrate it into the popular Minimap2 tool. Ranger achieves almost identical end-to-end performance as the original Minimap2, while occupying 1.7$\times$ and 1.2$\times$ less memory for short- and long-reads, respectively. With a limited memory capacity, Ranger achieves up to 4.3$\times$ speedup for short reads compared to FM-Index, and up to 4.2$\times$ and 1.8$\times$ speedups for short- and long-reads, compared to hash-tables. Ranger opens up new opportunities in the context of hardware acceleration by reducing the memory footprint of long-seed indexes used in state-of-the-art alignment accelerators by up to 23$\times$ which results with 3$\times$ faster alignment and negligible accuracy degradation. Moreover, its worst case memory bandwidth and latency can be bounded in advance without the need to inflate DRAM capacity.
Intelligent Assistant Language Understanding On Device
Authors: Cecilia Aas, Hisham Abdelsalam, Irina Belousova, Shruti Bhargava, Jianpeng Cheng, Robert Daland, Joris Driesen, Federico Flego, Tristan Guigue, Anders Johannsen, Partha Lal, Jiarui Lu, Joel Ruben Antony Moniz, Nathan Perkins, Dhivya Piraviperumal, Stephen Pulman, Diarmuid Ó Séaghdha, David Q. Sun, John Torr, Marco Del Vecchio, Jay Wacker, Jason D. Williams, Hong Yu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and technologies. For example, some approaches in the dialog systems literature are difficult to maintain over time in a deployment setting. We hope that sharing learnings from our practical experiences may help inform future work in the research community.
Deterministic Neural Illumination Mapping for Efficient Auto-White Balance Correction
Authors: Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Auto-white balance (AWB) correction is a critical operation in image signal processors for accurate and consistent color correction across various illumination scenarios. This paper presents a novel and efficient AWB correction method that achieves at least 35 times faster processing with equivalent or superior performance on high-resolution images for the current state-of-the-art methods. Inspired by deterministic color style transfer, our approach introduces deterministic illumination color mapping, leveraging learnable projection matrices for both canonical illumination form and AWB-corrected output. It involves feeding high-resolution images and corresponding latent representations into a mapping module to derive a canonical form, followed by another mapping module that maps the pixel values to those for the corrected version. This strategy is designed as resolution-agnostic and also enables seamless integration of any pre-trained AWB network as the backbone. Experimental results confirm the effectiveness of our approach, revealing significant performance improvements and reduced time complexity compared to state-of-the-art methods. Our method provides an efficient deep learning-based AWB correction solution, promising real-time, high-quality color correction for digital imaging applications. Source code is available at https://github.com/birdortyedi/DeNIM/
Caching-based Multicast Message Authentication in Time-critical Industrial Control Systems
Abstract
Attacks against industrial control systems (ICSs) often exploit the insufficiency of authentication mechanisms. Verifying whether the received messages are intact and issued by legitimate sources can prevent malicious data/command injection by illegitimate or compromised devices. However, the key challenge is to introduce message authentication for various ICS communication models, including multicast or broadcast, with a messaging rate that can be as high as thousands of messages per second, within very stringent latency constraints. For example, certain commands for protection in smart grids must be delivered within 2 milliseconds, ruling out public-key cryptography. This paper proposes two lightweight message authentication schemes, named CMA and its multicast variant CMMA, that perform precomputation and caching to authenticate future messages. With minimal precomputation and communication overhead, C(M)MA eliminates all cryptographic operations for the source after the message is given, and all expensive cryptographic operations for the destinations after the message is received. C(M)MA considers the urgency profile (or likelihood) of a set of future messages for even faster verification of the most time-critical (or likely) messages. We demonstrate the feasibility of C(M)MA in an ICS setting based on a substation automation system in smart grids.
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Authors: Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
XGBD: Explanation-Guided Graph Backdoor Detection
Authors: Zihan Guan, Mengnan Du, Ninghao Liu
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
Backdoor attacks pose a significant security risk to graph learning models. Backdoors can be embedded into the target model by inserting backdoor triggers into the training dataset, causing the model to make incorrect predictions when the trigger is present. To counter backdoor attacks, backdoor detection has been proposed. An emerging detection strategy in the vision and NLP domains is based on an intriguing phenomenon: when training models on a mixture of backdoor and clean samples, the loss on backdoor samples drops significantly faster than on clean samples, allowing backdoor samples to be easily detected by selecting samples with the lowest loss values. However, the ignorance of topological feature information on graph data limits its detection effectiveness when applied directly to the graph domain. To this end, we propose an explanation-guided backdoor detection method to take advantage of the topological information. Specifically, we train a helper model on the graph dataset, feed graph samples into the model, and then adopt explanation methods to attribute model prediction to an important subgraph. We observe that backdoor samples have distinct attribution distribution than clean samples, so the explanatory subgraph could serve as more discriminative features for detecting backdoor samples. Comprehensive experiments on multiple popular datasets and attack methods demonstrate the effectiveness and explainability of our method. Our code is available: https://github.com/GuanZihan/GNN_backdoor_detection.
Keyword: mobile
Mobile Supply: The Last Piece of Jigsaw of Recommender System
Authors: Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Abstract
Recommendation system is a fundamental functionality of online platforms. With the development of computing power of mobile phones, some researchers have deployed recommendation algorithms on users' devices to solve the problems of data transmission delay and pagination mechanism. However, the existing edge-side mobile rankings cannot completely solve the problem of pagination mechanism. The mobile rankings can only sort the items on the current page, so it will not work if it is called once or twice. Besides, after the user has viewed the items of interest to the user on the current page, the user refresh to get a new page of items. This will make the mobile ranking model do a lot of useless work and affect the user's immersive experience. In order to solve the pagination mechanism problem, we propose a completely new module in the pipeline of recommender named Mobile Supply. The pipeline of recommender system is extended to "retrival->pre-ranking->ranking->re-ranking->Mobile Supply->mobile ranking". Specifically, we introduce the concept of list value and use point-wise method to approximate list-wise estimation. We also design a new mobile ranking named device-aware mobile ranking considering the difference of mobile devices tailored to the new pipeline. Extensive offline and online experiments show the superiority of our proposed method and prove that Mobile Supply can further improve the performance of edge-side recommender system and user experience. Mobile Supply has been deployed on the homepage page of a large-scale online food platform and has yielded considerable profits in our business.
Eye-Shield: Real-Time Protection of Mobile Device Screen Information from Shoulder Surfing
Authors: Brian Tang, Kang G. Shin
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Abstract
People use mobile devices ubiquitously for computing, communication, storage, web browsing, and more. As a result, the information accessed and stored within mobile devices, such as financial and health information, text messages, and emails, can often be sensitive. Despite this, people frequently use their mobile devices in public areas, becoming susceptible to a simple yet effective attack, shoulder surfing. Shoulder surfing occurs when a person near a mobile user peeks at the user's mobile device, potentially acquiring passcodes, PINs, browsing behavior, or other personal information. We propose Eye-Shield, a solution to prevent shoulder surfers from accessing or stealing sensitive on-screen information. Eye-Shield is designed to protect all types of on-screen information in real time, without any serious impediment to users' interactions with their mobile devices. Eye-Shield generates images that appear readable at close distances, but appear blurry or pixelated at farther distances and wider angles. It is capable of protecting on-screen information from shoulder surfers, operating in real time, and being minimally intrusive to the intended users. Eye-Shield protects images and text from shoulder surfers by reducing recognition rates to 24.24% and 15.91%. Our implementations of Eye-Shield, with frame rates of 24 FPS for Android and 43 FPS for iOS, effectively work on screen resolutions as high as 1440x3088. Eye-Shield also incurs acceptable memory usage, CPU utilization, and energy overhead. Finally, our MTurk and in-person user studies indicate that Eye-Shield protects on-screen information without a large usability cost for privacy-conscious users.
Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning
Authors: Lorenzo S. Querol, Macario O. Cordel II, Dan Jeric A. Rustia, Mary Nia M. Santos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The aquaculture industry, strongly reliant on shrimp exports, faces challenges due to viral infections like the White Spot Syndrome Virus (WSSV) that severely impact output yields. In this context, computer vision can play a significant role in identifying features not immediately evident to skilled or untrained eyes, potentially reducing the time required to report WSSV infections. In this study, the challenge of limited data for WSSV recognition was addressed. A mobile application dedicated to data collection and monitoring was developed to facilitate the creation of an image dataset to train a WSSV recognition model and improve country-wide disease surveillance. The study also includes a thorough analysis of WSSV recognition to address the challenge of imbalanced learning and on-device inference. The models explored, MobileNetV3-Small and EfficientNetV2-B0, gained an F1-Score of 0.72 and 0.99 respectively. The saliency heatmaps of both models were also observed to uncover the "black-box" nature of these models and to gain insight as to what features in the images are most important in making a prediction. These results highlight the effectiveness and limitations of using models designed for resource-constrained devices and balancing their performance in accurately recognizing WSSV, providing valuable information and direction in the use of computer vision in this domain.
A Lightweight and Accurate Face Detection Algorithm Based on Retinaface
Authors: Baozhu Liu, Hewei Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In this paper, we propose a lightweight and accurate face detection algorithm LAFD (Light and accurate face detection) based on Retinaface. Backbone network in the algorithm is a modified MobileNetV3 network which adjusts the size of the convolution kernel, the channel expansion multiplier of the inverted residuals block and the use of the SE attention mechanism. Deformable convolution network(DCN) is introduced in the context module and the algorithm uses focal loss function instead of cross-entropy loss function as the classification loss function of the model. The test results on the WIDERFACE dataset indicate that the average accuracy of LAFD is 94.1%, 92.2% and 82.1% for the "easy", "medium" and "hard" validation subsets respectively with an improvement of 3.4%, 4.0% and 8.3% compared to Retinaface and 3.1%, 4.1% and 4.1% higher than the well-performing lightweight model, LFFD. If the input image is pre-processed and scaled to 1560px in length or 1200px in width, the model achieves an average accuracy of 86.2% on the 'hard' validation subset. The model is lightweight, with a size of only 10.2MB.
Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review
Authors: Sajjad Emdadi Mahdimahalleh
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
These days with the rising computational capabilities of wireless user equipment such as smart phones, tablets, and vehicles, along with growing concerns about sharing private data, a novel machine learning model called federated learning (FL) has emerged. FL enables the separation of data acquisition and computation at the central unit, which is different from centralized learning that occurs in a data center. FL is typically used in a wireless edge network where communication resources are limited and unreliable. Bandwidth constraints necessitate scheduling only a subset of UEs for updates in each iteration, and because the wireless medium is shared, transmissions are susceptible to interference and are not assured. The article discusses the significance of Machine Learning in wireless communication and highlights Federated Learning (FL) as a novel approach that could play a vital role in future mobile networks, particularly 6G and beyond.
Near-field 6G Networks: Why Mobile Terahertz Communications MUST Operate in the Near Field
Abstract
Near-field mobile terahertz (THz) communications is one of the candidate enablers for high-rate wireless data exchange in sixth-generation (6G) networks. However, operating in the THz near field brings both attractive opportunities and severe challenges. Hence, it becomes of interest to explore if it is possible to design a realistic mobile THz communication system without working in the THz near field. To answer this question, a mathematical framework is presented modeling a mobile THz link that works exclusively in the far field. The study leads to an interesting theoretical conclusion: while the actual frequency is of (almost) no interest, such a system must operate over a limited bandwidth not exceeding a certain threshold. It is then numerically shown that operating only in the far field imposes stringent limitations on mobile THz communications, thus making them less attractive to prospective high-rate services. In contrast, it is shown that a stationary THz link can still be broadband even when staying exclusively in the THz far field. Hence, broadband mobile THz communications MUST be near-field, while broadband stationary THz links do not have to.
Abstract
We study pseudo-polynomial time algorithms for the fundamental \emph{0-1 Knapsack} problem. Recent research interest has focused on its fine-grained complexity with respect to the number of items $n$ and the \emph{maximum item weight} $w{\max}$. Under $(\min,+)$-convolution hypothesis, 0-1 Knapsack does not have $O((n+w{\max})^{2-\delta})$ time algorithms (Cygan-Mucha-W\k{e}grzycki-W\l{}odarczyk 2017 and K\"{u}nnemann-Paturi-Schneider 2017). On the upper bound side, currently the fastest algorithm runs in $\tilde O(n + w{\max}^{12/5})$ time (Chen, Lian, Mao, and Zhang 2023), improving the earlier $O(n + w{\max}^3)$-time algorithm by Polak, Rohwedder, and W\k{e}grzycki (2021). In this paper, we close this gap between the upper bound and the conditional lower bound (up to subpolynomial factors): - The 0-1 Knapsack problem has a deterministic algorithm in $O(n + w{\max}^{2}\log^4w{\max})$ time. Our algorithm combines and extends several recent structural results and algorithmic techniques from the literature on knapsack-type problems: - We generalize the "fine-grained proximity" technique of Chen, Lian, Mao, and Zhang (2023) derived from the additive-combinatorial results of Bringmann and Wellnitz (2021) on dense subset sums. This allows us to bound the support size of the useful partial solutions in the dynamic program. - To exploit the small support size, our main technical component is a vast extension of the "witness propagation" method, originally designed by Deng, Mao, and Zhong (2023) for speeding up dynamic programming in the easier unbounded knapsack settings. To extend this approach to our 0-1 setting, we use a novel pruning method, as well as the two-level color-coding of Bringmann (2017) and the SMAWK algorithm on tall matrices.
Abstract
The occurrence of diffusion on a graph is a prevalent and significant phenomenon, as evidenced by the spread of rumors, influenza-like viruses, smart grid failures, and similar events. Comprehending the behaviors of flow is a formidable task, due to the intricate interplay between the distribution of seeds that initiate flow propagation, the propagation model, and the topology of the graph. The study of networks encompasses a diverse range of academic disciplines, including mathematics, physics, social science, and computer science. This interdisciplinary nature of network research is characterized by a high degree of specialization and compartmentalization, and the cooperation facilitated by them is inadequate. From a machine learning standpoint, there is a deficiency in a cohesive platform for assessing algorithms across various domains. One of the primary obstacles to current research in this field is the absence of a comprehensive curated benchmark suite to study the flow behaviors under network scenarios. To address this disparity, we propose the implementation of a novel benchmark suite that encompasses a variety of tasks, baseline models, graph datasets, and evaluation tools. In addition, we present a comprehensive analytical framework that offers a generalized approach to numerous flow-related tasks across diverse domains, serving as a blueprint and roadmap. Drawing upon the outcomes of our empirical investigation, we analyze the advantages and disadvantages of current foundational models, and we underscore potential avenues for further study. The datasets, code, and baseline models have been made available for the public at: https://github.com/XGraphing/XFlow
A staggered-in-time and non-conforming-in-space numerical framework for realistic cardiac electrophysiology outputs
Authors: Elena Zappon, Andrea Manzoni, Alfio Quarteroni
Abstract
Computer-based simulations of non-invasive cardiac electrical outputs, such as electrocardiograms and body surface potential maps, usually entail severe computational costs due to the need of capturing fine-scale processes and to the complexity of the heart-torso morphology. In this work, we model cardiac electrical outputs by employing a coupled model consisting of a reaction-diffusion model - either the bidomain model or the most efficient pseudo-bidomain model - on the heart, and an elliptic model in the torso. We then solve the coupled problem with a segregated and staggered in-time numerical scheme, that allows for independent and infrequent solution in the torso region. To further reduce the computational load, main novelty of this work is in introduction of an interpolation method at the interface between the heart and torso domains, enabling the use of non-conforming meshes, and the numerical framework application to realistic cardiac and torso geometries. The reliability and efficiency of the proposed scheme is tested against the corresponding state-of-the-art bidomain-torso model. Furthermore, we explore the impact of torso spatial discretization and geometrical non-conformity on the model solution and the corresponding clinical outputs. The investigation of the interface interpolation method provides insights into the influence of torso spatial discretization and of the geometrical non-conformity on the simulation results and their clinical relevance.
Synthetic Augmentation with Large-scale Unconditional Pre-training
Abstract
Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug.
Boundary-preserving Lamperti-splitting scheme for some Stochastic Differential Equations
Authors: Johan Ulander (Chalmers University of Technology)
Abstract
We propose and analyse an explicit boundary-preserving scheme for the strong approximations of some SDEs with non-globally Lipschitz drift and diffusion coefficients whose state-space is bounded. The scheme consists of a Lamperti transform followed by a Lie--Trotter splitting. We prove $L^{p}(\Omega)$-convergence of order $1$, for every $p \in \mathbb{N}$, of the scheme and exploit the Lamperti transform to confine the numerical approximations to the state-space of the considered SDE. We provide numerical experiments that confirm the theoretical results and compare the proposed Lamperti-splitting scheme to other numerical schemes for SDEs.
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion
Authors: Yizhuo Lu, Changde Du, Qiongyi zhou, Dianpeng Wang, Huiguang He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.
Abstract
Warning: this paper contains content that may be inappropriate or offensive. As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Our framework uses in-context learning in a feedback loop to red team models and trigger them into unsafe content generation. We propose different in-context attack strategies to automatically learn effective and diverse adversarial prompts for text-to-image models. Our experiments demonstrate that compared to baseline approaches, our proposed strategy is significantly more effective in exposing vulnerabilities in Stable Diffusion (SD) model, even when the latter is enhanced with safety features. Furthermore, we demonstrate that the proposed framework is effective for red teaming text-to-text models, resulting in significantly higher toxic response generation rate compared to previously reported numbers.
MCDAN: a Multi-scale Context-enhanced Dynamic Attention Network for Diffusion Prediction
Authors: Xiaowen Wang, Lanjun Wang, Yuting Su, Yongdong Zhang, An-An Liu
Abstract
Information diffusion prediction aims at predicting the target users in the information diffusion path on social networks. Prior works mainly focus on the observed structure or sequence of cascades, trying to predict to whom this cascade will be infected passively. In this study, we argue that user intent understanding is also a key part of information diffusion prediction. We thereby propose a novel Multi-scale Context-enhanced Dynamic Attention Network (MCDAN) to predict which user will most likely join the observed current cascades. Specifically, to consider the global interactive relationship among users, we take full advantage of user friendships and global cascading relationships, which are extracted from the social network and historical cascades, respectively. To refine the model's ability to understand the user's preference for the current cascade, we propose a multi-scale sequential hypergraph attention module to capture the dynamic preference of users at different time scales. Moreover, we design a contextual attention enhancement module to strengthen the interaction of user representations within the current cascade. Finally, to engage the user's own susceptibility, we construct a susceptibility label for each user based on user susceptibility analysis and use the rank of this label for auxiliary prediction. We conduct experiments over four widely used datasets and show that MCDAN significantly overperforms the state-of-the-art models. The average improvements are up to 10.61% in terms of Hits@100 and 9.71% in terms of MAP@100, respectively.
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On
Authors: Daiheng Gao, Xu Chen, Xindi Zhang, Qi Wang, Ke Sun, Bang Zhang, Liefeng Bo, Qixing Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Fabricating and designing 3D garments has become extremely demanding with the increasing need for synthesizing realistic dressed persons for a variety of applications, e.g. 3D virtual try-on, digitalization of 2D clothes into 3D apparel, and cloth animation. It thus necessitates a simple and straightforward pipeline to obtain high-quality texture from simple input, such as 2D reference images. Since traditional warping-based texture generation methods require a significant number of control points to be manually selected for each type of garment, which can be a time-consuming and tedious process. We propose a novel method, called Cloth2Tex, which eliminates the human burden in this process. Cloth2Tex is a self-supervised method that generates texture maps with reasonable layout and structural consistency. Another key feature of Cloth2Tex is that it can be used to support high-fidelity texture inpainting. This is done by combining Cloth2Tex with a prevailing latent diffusion model. We evaluate our approach both qualitatively and quantitatively and demonstrate that Cloth2Tex can generate high-quality texture maps and achieve the best visual effects in comparison to other methods. Project page: tomguluson92.github.io/projects/cloth2tex/
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
Keyword: adaptive
AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning
Abstract
Continual lifelong learning is an machine learning framework inspired by human learning, where learners are trained to continuously acquire new knowledge in a sequential manner. However, the non-stationary nature of streaming training data poses a significant challenge known as catastrophic forgetting, which refers to the rapid forgetting of previously learned knowledge when new tasks are introduced. While some approaches, such as experience replay (ER), have been proposed to mitigate this issue, their performance remains limited, particularly in the class-incremental scenario which is considered natural and highly challenging. In this paper, we present a novel algorithm, called adaptive-experience replay (AdaER), to address the challenge of continual lifelong learning. AdaER consists of two stages: memory replay and memory update. In the memory replay stage, AdaER introduces a contextually-cued memory recall (C-CMR) strategy, which selectively replays memories that are most conflicting with the current input data in terms of both data and task. Additionally, AdaER incorporates an entropy-balanced reservoir sampling (E-BRS) strategy to enhance the performance of the memory buffer by maximizing information entropy. To evaluate the effectiveness of AdaER, we conduct experiments on established supervised continual lifelong learning benchmarks, specifically focusing on class-incremental learning scenarios. The results demonstrate that AdaER outperforms existing continual lifelong learning baselines, highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance.
Storyfier: Exploring Vocabulary Learning Support with Text Generation Models
Abstract
Vocabulary learning support tools have widely exploited existing materials, e.g., stories or video clips, as contexts to help users memorize each target word. However, these tools could not provide a coherent context for any target words of learners' interests, and they seldom help practice word usage. In this paper, we work with teachers and students to iteratively develop Storyfier, which leverages text generation models to enable learners to read a generated story that covers any target words, conduct a story cloze test, and use these words to write a new story with adaptive AI assistance. Our within-subjects study (N=28) shows that learners generally favor the generated stories for connecting target words and writing assistance for easing their learning workload. However, in the read-cloze-write learning sessions, participants using Storyfier perform worse in recalling and using target words than learning with a baseline tool without our AI features. We discuss insights into supporting learning tasks with generative models.
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data
Authors: Anup Shakya, Vasile Rus, Deepak Venugopal
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Abstract
Understanding a student's problem-solving strategy can have a significant impact on effective math learning using Intelligent Tutoring Systems (ITSs) and Adaptive Instructional Systems (AISs). For instance, the ITS/AIS can better personalize itself to correct specific misconceptions that are indicated by incorrect strategies, specific problems can be designed to improve strategies and frustration can be minimized by adapting to a student's natural way of thinking rather than trying to fit a standard strategy for all. While it may be possible for human experts to identify strategies manually in classroom settings with sufficient student interaction, it is not possible to scale this up to big data. Therefore, we leverage advances in Machine Learning and AI methods to perform scalable strategy prediction that is also fair to students at all skill levels. Specifically, we develop an embedding called MVec where we learn a representation based on the mastery of students. We then cluster these embeddings with a non-parametric clustering method where we progressively learn clusters such that we group together instances that have approximately symmetrical strategies. The strategy prediction model is trained on instances sampled from these clusters. This ensures that we train the model over diverse strategies and also that strategies from a particular group do not bias the DNN model, thus allowing it to optimize its parameters over all groups. Using real world large-scale student interaction datasets from MATHia, we implement our approach using transformers and Node2Vec for learning the mastery embeddings and LSTMs for predicting strategies. We show that our approach can scale up to achieve high accuracy by training on a small sample of a large dataset and also has predictive equality, i.e., it can predict strategies equally well for learners at diverse skill levels.
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation
Authors: Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, Risheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Infrared and visible image fusion is a powerful technique that combines complementary information from different modalities for downstream semantic perception tasks. Existing learning-based methods show remarkable performance, but are suffering from the inherent vulnerability of adversarial attacks, causing a significant decrease in accuracy. In this work, a perception-aware fusion framework is proposed to promote segmentation robustness in adversarial scenes. We first conduct systematic analyses about the components of image fusion, investigating the correlation with segmentation robustness under adversarial perturbations. Based on these analyses, we propose a harmonized architecture search with a decomposition-based structure to balance standard accuracy and robustness. We also propose an adaptive learning strategy to improve the parameter robustness of image fusion, which can learn effective feature extraction under diverse adversarial perturbations. Thus, the goals of image fusion (\textit{i.e.,} extracting complementary features from source modalities and defending attack) can be realized from the perspectives of architectural and learning strategies. Extensive experimental results demonstrate that our scheme substantially enhances the robustness, with gains of 15.3% mIOU of segmentation in the adversarial scene, compared with advanced competitors. The source codes are available at https://github.com/LiuZhu-CV/PAIF.
NEOLAF, an LLM-powered neural-symbolic cognitive architecture
Authors: Richard Jiarui Tong, Cassie Chen Cao, Timothy Xueqian Lee, Guodong Zhao, Ray Wan, Feiyue Wang, Xiangen Hu, Robin Schmucker, Jinsheng Pan, Julian Quevedo, Yu Lu
Abstract
This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and distributed learning, human-in-the-loop enablement, and self-improvement. The paper further presents a compelling experiment where a NEOLAF agent, built as a problem-solving agent, is fed with complex math problems from the open-source MATH dataset. The results demonstrate NEOLAF's superior learning capability and its potential to revolutionize the field of cognitive architectures and self-improving adaptive instructional systems.
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
Authors: Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image retrieval targets to find images from a database that are visually similar to the query image. Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. To better trade-off retrieval efficiency and accuracy, some approaches fuse global and local feature into a joint representation to perform single-stage image retrieval. However, they are still challenging due to various situations to tackle, $e.g.$, background, occlusion and viewpoint. In this work, we design a Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD) for end-to-end single-stage image retrieval-requiring only image-level labels. Specifically, we first design a novel adaptive softmax-based loss which dynamically tunes its scale and margin within each mini-batch and increases them progressively to strengthen supervision during training and intra-class compactness. Furthermore, we propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation by a hard negative sampling strategy to optimize inter-class distinctiveness at a global scale. Extensive experimental results have demonstrated the effectiveness of our method, which achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris. Code is available at https://github.com/bassyess/CFCD.
A space-time high-order implicit shock tracking method for shock-dominated unsteady flows
Authors: Charles J. Naudet, Matthew J. Zahr
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Optimization and Control (math.OC)
Abstract
High-order implicit shock tracking (fitting) is a class of high-order, optimization-based numerical methods to approximate solutions of conservation laws with non-smooth features by aligning elements of the computational mesh with non-smooth features. This ensures the non-smooth features are perfectly represented by inter-element jumps and high-order basis functions approximate smooth regions of the solution without nonlinear stabilization, which leads to accurate approximations on traditionally coarse meshes. In this work, we extend implicit shock tracking to time-dependent problems using a slab-based space-time approach. This is achieved by reformulating a time-dependent conservation law as a steady conservation law in one higher dimension and applying existing implicit shock tracking techniques. To avoid computations over the entire time domain and unstructured mesh generation in higher dimensions, we introduce a general procedure to generate conforming, simplex-only meshes of space-time slabs in such a way that preserves features (e.g., curved elements, refinement regions) from previous time slabs. The use of space-time slabs also simplifies the shock tracking problem by reducing temporal complexity. Several practical adaptations of the implicit shock tracking solvers are developed for the space-time setting including 1) a self-adjusting temporal boundary, 2) nondimensionalization of a space-time slab, 3) adaptive mesh refinement, and 4) shock boundary conditions, which lead to accurate solutions on coarse space-time grids, even for problem with complex flow features such as curved shocks, shock formation, shock-shock and shock-boundary interaction, and triple points.
Learning Specialized Activation Functions for Physics-informed Neural Networks
Authors: Honghui Wang, Lu Lu, Shiji Song, Gao Huang
Abstract
Physics-informed neural networks (PINNs) are known to suffer from optimization difficulty. In this work, we reveal the connection between the optimization difficulty of PINNs and activation functions. Specifically, we show that PINNs exhibit high sensitivity to activation functions when solving PDEs with distinct properties. Existing works usually choose activation functions by inefficient trial-and-error. To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems. We compare different adaptive activation functions and discuss their limitations in the context of PINNs. Furthermore, we propose to tailor the idea of learning combinations of candidate activation functions to the PINNs optimization, which has a higher requirement for the smoothness and diversity on learned functions. This is achieved by removing activation functions which cannot provide higher-order derivatives from the candidate set and incorporating elementary functions with different properties according to our prior knowledge about the PDE at hand. We further enhance the search space with adaptive slopes. The proposed adaptive activation function can be used to solve different PDE systems in an interpretable way. Its effectiveness is demonstrated on a series of benchmarks. Code is available at https://github.com/LeapLabTHU/AdaAFforPINNs.
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Abstract
Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.
Understanding and Modeling Passive-Negative Feedback for Short-video Sequential Recommendation
Authors: Yunzhu Pan, Chen Gao, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Depeng Jin, Yong Li
Abstract
Sequential recommendation is one of the most important tasks in recommender systems, which aims to recommend the next interacted item with historical behaviors as input. Traditional sequential recommendation always mainly considers the collected positive feedback such as click, purchase, etc. However, in short-video platforms such as TikTok, video viewing behavior may not always represent positive feedback. Specifically, the videos are played automatically, and users passively receive the recommended videos. In this new scenario, users passively express negative feedback by skipping over videos they do not like, which provides valuable information about their preferences. Different from the negative feedback studied in traditional recommender systems, this passive-negative feedback can reflect users' interests and serve as an important supervision signal in extracting users' preferences. Therefore, it is essential to carefully design and utilize it in this novel recommendation scenario. In this work, we first conduct analyses based on a large-scale real-world short-video behavior dataset and illustrate the significance of leveraging passive feedback. We then propose a novel method that deploys the sub-interest encoder, which incorporates positive feedback and passive-negative feedback as supervision signals to learn the user's current active sub-interest. Moreover, we introduce an adaptive fusion layer to integrate various sub-interests effectively. To enhance the robustness of our model, we then introduce a multi-task learning module to simultaneously optimize two kinds of feedback -- passive-negative feedback and traditional randomly-sampled negative feedback. The experiments on two large-scale datasets verify that the proposed method can significantly outperform state-of-the-art approaches. The code is released at https://github.com/tsinghua-fib-lab/RecSys2023-SINE.
Flexible Distributed Flocking Control for Multi-agent Unicycle Systems
Abstract
Currently, the general aim of flocking and formation control laws for multi-agent systems is to form and maintain a rigid configuration, such as, the alpha-lattices in flocking control methods, where the desired distance between each pair of connected agents is fixed. This introduces a scalability issue for large-scale deployment of agents due to unrealizable geometrical constraints and the constant need of centralized orchestrator to ensure the formation graph rigidity. This paper presents a flexible distributed flocking cohesion algorithm for nonholonomic multi-agent systems. The desired geometry configuration between each pair of agents is adaptive and flexible. The distributed flocking goal is achieved using limited information exchange (i.e., the local field gradient) between connected neighbor agents and it does not rely on any other motion variables measurements, such as (relative) position, velocity, or acceleration. Additionally, the flexible flocking scheme with safety is considered so that the agents with limited sensing capability are able to maintain the connectedness of communication topology at all time and avoid inter-agent collisions. The stability analysis of the proposed methods is presented along with numerical simulation results to show their effectiveness.
Optimizing Adaptive Video Streaming with Human Feedback
Authors: Tianchi Huang, Rui-Xiao Zhang, Chenglei Wu, Lifeng Sun
Abstract
Quality of Experience~(QoE)-driven adaptive bitrate~(ABR) algorithms are typically optimized using QoE models that are based on the mean opinion score~(MOS), while such principles may not account for user heterogeneity on rating scales, resulting in unexpected behaviors. In this paper, we propose \texttt{Jade}, which leverages reinforcement learning with human feedback~(RLHF) technologies to better align the users' opinion scores. \texttt{Jade}'s rank-based QoE model considers relative values of user ratings to interpret the subjective perception of video sessions. We implement linear-based and Deep Neural Network (DNN)-based architectures for satisfying both accuracy and generalization ability. We further propose entropy-aware reinforced mechanisms for training policies with the integration of the proposed QoE models. Experimental results demonstrate that \texttt{Jade} performs favorably on conventional metrics, such as quality and stall ratio, and improves QoE by 8.09\%-38.13\% in different network conditions, emphasizing the importance of user heterogeneity in QoE modeling and the potential of combining linear-based and DNN-based models for performance improvement.
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
Authors: Huilin Zhang, Sumei Li, Yongli Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Abstract
Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.
Real-Time Progressive Learning: Mutually Reinforcing Learning and Control with Neural-Network-Based Selective Memory
Authors: Yiming Fei, Jiangang Li, Yanan Li
Subjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE)
Abstract
Memory, as the basis of learning, determines the storage, update and forgetting of the knowledge and further determines the efficiency of learning. Featured with a mechanism of memory, a radial basis function neural network (RBFNN) based learning control scheme named real-time progressive learning (RTPL) is proposed to learn the unknown dynamics of the system with guaranteed stability and closed-loop performance. Instead of the stochastic gradient descent (SGD) update law in adaptive neural control (ANC), RTPL adopts the selective memory recursive least squares (SMRLS) algorithm to update the weights of the RBFNN. Through SMRLS, the approximation capabilities of the RBFNN are uniformly distributed over the feature space and thus the passive knowledge forgetting phenomenon of SGD method is suppressed. Subsequently, RTPL achieves the following merits over the classical ANC: 1) guaranteed learning capability under low-level persistent excitation (PE), 2) improved learning performance (learning speed, accuracy and generalization capability), and 3) low gain requirement ensuring robustness of RTPL in practical applications. Moreover, the RTPL based learning and control will gradually reinforce each other during the task execution, making it appropriate for long-term learning control tasks. As an example, RTPL is used to address the tracking control problem of a class of nonlinear systems with RBFNN being an adaptive feedforward controller. Corresponding theoretical analysis and simulation studies demonstrate the effectiveness of RTPL.
AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation
Authors: Amir M. Mansourian, Rozhan Ahmadi, Shohreh Kasaei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In recent years, deep neural networks have achieved remarkable accuracy in computer vision tasks. With inference time being a crucial factor, particularly in dense prediction tasks such as semantic segmentation, knowledge distillation has emerged as a successful technique for improving the accuracy of lightweight student networks. The existing methods often neglect the information in channels and among different classes. To overcome these limitations, this paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation. The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs. This is followed by calculating inter-class similarity matrices for distillation using KL divergence between distributions of each pair of classes. To further improve the effectiveness of the proposed method, an Adaptive Loss Weighting (ALW) training strategy is proposed. Unlike existing methods, the ALW strategy gradually reduces the influence of the teacher network towards the end of training process to account for errors in teacher's predictions. Extensive experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method in terms of mIoU and pixel accuracy. The proposed method outperforms most of existing knowledge distillation methods as demonstrated by both quantitative and qualitative evaluations. Code is available at: https://github.com/AmirMansurian/AICSD
Lossy and Lossless (L$^2$) Post-training Model Size Compression
Abstract
Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high compression ratios efficiently. This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. We first propose a unified parametric weight transformation, which ensures different lossy compression methods can be performed jointly in a post-training manner. Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression. Additionally, our method can easily control a desired global compression ratio and allocate adaptive ratios for different layers. Finally, our method can achieve a stable $10\times$ compression ratio without sacrificing accuracy and a $20\times$ compression ratio with minor accuracy loss in a short time. Our code is available at https://github.com/ModelTC/L2_Compression .
Keyword: quantization
EFaR 2023: Efficient Face Recognition Competition
Authors: Jan Niklas Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, Naser Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, Sajid Javed, Naoufel Werghi, Klemen Grm, Vitomir Štruc, Fernando Alonso-Fernandez, Kevin Hernandez Diaz, Josef Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, Sébastien Marcel, Iurii Medvedev, Bo Jin, Diogo Nunes, Ahmad Hassanpour, Pankaj Khatiwada, Aafan Ahmad Toor, Bian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.
Keyword: efficient
Applications of Machine Learning to Modelling and Analysing Dynamical Systems
Improved Neural Radiance Fields Using Pseudo-depth and Fusion
Goodness-of-Fit of Attributed Probabilistic Graph Generative Models
Exploring IoT for real-time CO2 monitoring and analysis
Nucleotide String Indexing using Range Matching
CECM: A continuous empirical cubature method with application to the dimensional hyperreduction of parameterized finite element models
A staggered-in-time and non-conforming-in-space numerical framework for realistic cardiac electrophysiology outputs
New Bounds on Quotient Polynomials with Applications to Exact Divisibility and Divisibility Testing of Sparse Polynomials
ForensiBlock: A Provenance-Driven Blockchain Framework for Data Forensics and Auditability
Optimizing the switching operation in monoclonal antibody production: Economic MPC and reinforcement learning
Deterministic Neural Illumination Mapping for Efficient Auto-White Balance Correction
A Benchmarking Study of Matching Algorithms for Knowledge Graph Entity Alignment
An Approximate Dynamic Programming Approach to Vehicle Platooning Coordination in Networks
CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification
Optimal partitioning of directed acyclic graphs with dependent costs between clusters
Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool
Reasonable mechanical model on shallow tunnel excavation to eliminate displacement singularity caused by unbalanced resultant
Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks
Explicit Topology Optimization of Conforming Voronoi Foams
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Top K Relevant Passage Retrieval for Biomedical Question Answering
An Empirical Analysis of Range for 3D Object Detection
Learning Specialized Activation Functions for Physics-informed Neural Networks
Boundary-preserving Lamperti-splitting scheme for some Stochastic Differential Equations
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
S&Reg: End-to-End Learning-Based Model for Multi-Goal Path Planning Problem
EFaR 2023: Efficient Face Recognition Competition
Core interface optimization for multi-core neuromorphic processors
Communication-Efficient Cooperative Multi-Agent PPO via Regulated Segment Mixture in Internet of Vehicles
Robust retrieval of material chemical states in X-ray microspectroscopy
A Differential Datalog Interpreter
AquaSAM: Underwater Image Foreground Segmentation
Flexible and rigorous numerical modelling of multiphysics processes in fractured porous media using PorePy
CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages
Novel Area-Efficient and Flexible Architectures for Optimal Ate Pairing on FPGA
BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning
Lossy and Lossless (L$^2$) Post-training Model Size Compression
Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
Keyword: faster
Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization
Nucleotide String Indexing using Range Matching
Intelligent Assistant Language Understanding On Device
Deterministic Neural Illumination Mapping for Efficient Auto-White Balance Correction
Caching-based Multicast Message Authentication in Time-critical Industrial Control Systems
3D Gaussian Splatting for Real-Time Radiance Field Rendering
XGBD: Explanation-Guided Graph Backdoor Detection
Keyword: mobile
Mobile Supply: The Last Piece of Jigsaw of Recommender System
Eye-Shield: Real-Time Protection of Mobile Device Screen Information from Shoulder Surfing
Application for White Spot Syndrome Virus (WSSV) Monitoring using Edge Machine Learning
A Lightweight and Accurate Face Detection Algorithm Based on Retinaface
Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review
Near-field 6G Networks: Why Mobile Terahertz Communications MUST Operate in the Near Field
Keyword: pruning
0-1 Knapsack in Nearly Quadratic Time
Keyword: diffusion
XFlow: Benchmarking Flow Behaviors over Graphs
A staggered-in-time and non-conforming-in-space numerical framework for realistic cardiac electrophysiology outputs
Synthetic Augmentation with Large-scale Unconditional Pre-training
Boundary-preserving Lamperti-splitting scheme for some Stochastic Differential Equations
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion
FLIRT: Feedback Loop In-context Red Teaming
MCDAN: a Multi-scale Context-enhanced Dynamic Attention Network for Diffusion Prediction
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
Keyword: adaptive
AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning
Storyfier: Exploring Vocabulary Learning Support with Text Generation Models
Scalable and Equitable Math Problem Solving Strategy Prediction in Big Educational Data
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation
NEOLAF, an LLM-powered neural-symbolic cognitive architecture
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
A space-time high-order implicit shock tracking method for shock-dominated unsteady flows
Learning Specialized Activation Functions for Physics-informed Neural Networks
Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients
Understanding and Modeling Passive-Negative Feedback for Short-video Sequential Recommendation
Flexible Distributed Flocking Control for Multi-agent Unicycle Systems
Optimizing Adaptive Video Streaming with Human Feedback
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
Real-Time Progressive Learning: Mutually Reinforcing Learning and Control with Neural-Network-Based Selective Memory
AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation
Lossy and Lossless (L$^2$) Post-training Model Size Compression
Keyword: quantization
EFaR 2023: Efficient Face Recognition Competition