Abstract
Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.
Keyword: medical visualization
There is no result
Keyword: interactive volume
There is no result
Keyword: rendering
Evaluation Framework for Quantum Security Risk Assessment: A Comprehensive Study for Quantum-Safe Migration
Authors: Yaser Baseri, Vikas Chouhan, Ali Ghorbani, Aaron Chow
Abstract
The rise of large-scale quantum computing poses a significant threat to traditional cryptographic security measures. Quantum attacks undermine current asymmetric cryptographic algorithms, rendering them ineffective. Even symmetric key cryptography is vulnerable, albeit to a lesser extent, suggesting longer keys or extended hash functions for security. Thus, current cryptographic solutions are inadequate against emerging quantum threats. Organizations must transition to quantum-safe environments with robust continuity plans and meticulous risk management. This study explores the challenges of migrating to quantum-safe cryptographic states, introducing a comprehensive security risk assessment framework. We propose a security risk assessment framework that examines vulnerabilities across algorithms, certificates, and protocols throughout the migration process (pre-migration, during migration, post-migration). We link these vulnerabilities to the STRIDE threat model to assess their impact and likelihood. Then, we discuss practical mitigation strategies for critical components like algorithms, public key infrastructures, and protocols. Our study not only identifies potential attacks and vulnerabilities at each layer and migration stage but also suggests possible countermeasures and alternatives to enhance system resilience, empowering organizations to construct a secure infrastructure for the quantum era. Through these efforts, we establish the foundation for enduring security in networked systems amid the challenges of the quantum era.
GPN: Generative Point-based NeRF
Authors: Haipeng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Scanning real-life scenes with modern registration devices typically gives incomplete point cloud representations, primarily due to the limitations of partial scanning, 3D occlusions, and dynamic light conditions. Recent works on processing incomplete point clouds have always focused on point cloud completion. However, these approaches do not ensure consistency between the completed point cloud and the captured images regarding color and geometry. We propose using Generative Point-based NeRF (GPN) to reconstruct and repair a partial cloud by fully utilizing the scanning images and the corresponding reconstructed cloud. The repaired point cloud can achieve multi-view consistency with the captured images at high spatial resolution. For the finetunes of a single scene, we optimize the global latent condition by incorporating an Auto-Decoder architecture while retaining multi-view consistency. As a result, the generated point clouds are smooth, plausible, and geometrically consistent with the partial scanning images. Extensive experiments on ShapeNet demonstrate that our works achieve competitive performances to the other state-of-the-art point cloud-based neural scene rendering and editing performances.
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
Abstract
Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training.
OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering
Authors: Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
Overview of the TREC 2023 NeuCLIR Track
Authors: Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
Abstract
The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.
Extending Translate-Train for ColBERT-X to African Language CLIR
Authors: Eugene Yang, Dawn J. Lawrie, Paul McNamee, James Mayfield
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Abstract
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
Generative Information Retrieval Evaluation
Authors: Marwah Alaofi, Negar Arabzadeh, Charles L. A. Clarke, Mark Sanderson
Abstract
In this chapter, we consider generative information retrieval evaluation from two distinct but interrelated perspectives. First, large language models (LLMs) themselves are rapidly becoming tools for evaluation, with current research indicating that LLMs may be superior to crowdsource workers and other paid assessors on basic relevance judgement tasks. We review past and ongoing related research, including speculation on the future of shared task initiatives, such as TREC, and a discussion on the continuing need for human assessments. Second, we consider the evaluation of emerging LLM-based generative information retrieval (GenIR) systems, including retrieval augmented generation (RAG) systems. We consider approaches that focus both on the end-to-end evaluation of GenIR systems and on the evaluation of a retrieval component as an element in a RAG system. Going forward, we expect the evaluation of GenIR systems to be at least partially based on LLM-based assessment, creating an apparent circularity, with a system seemingly evaluating its own output. We resolve this apparent circularity in two ways: 1) by viewing LLM-based assessment as a form of "slow search", where a slower IR system is used for evaluation and training of a faster production IR system; and 2) by recognizing a continuing need to ground evaluation in human assessment, even if the characteristics of that human assessment must change.
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Authors: Patrice Béchard, Orlando Marquez Ayala
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have taken the world by storm, without eliminating or at least reducing hallucinations, real-world GenAI systems may face challenges in user adoption. In the process of deploying an enterprise application that produces workflows based on natural language requirements, we devised a system leveraging Retrieval Augmented Generation (RAG) to greatly improve the quality of the structured output that represents such workflows. Thanks to our implementation of RAG, our proposed system significantly reduces hallucinations in the output and improves the generalization of our LLM in out-of-domain settings. In addition, we show that using a small, well-trained retriever encoder can reduce the size of the accompanying LLM, thereby making deployments of LLM-based systems less resource-intensive.
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Authors: Juraj Vladika, Florian Matthes
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract
In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases. By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets. We modify different retrieval settings to observe their influence on the QA pipeline's performance, including the number of retrieved documents, sentence selection process, the publication year of articles, and their number of citations. Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%. We discuss the results, highlight interesting examples, and outline challenges for future research, like managing evidence disagreement and crafting user-friendly explanations.
Lightweight Multi-System Multivariate Interconnection and Divergence Discovery
Authors: Mulugeta Weldezgina Asres, Christian Walter Omlin, Jay Dittmann, Pavel Parygin, Joshua Hiltbrand, Seth I. Cooper, Grace Cummings, David Yu
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Identifying outlier behavior among sensors and subsystems is essential for discovering faults and facilitating diagnostics in large systems. At the same time, exploring large systems with numerous multivariate data sets is challenging. This study presents a lightweight interconnection and divergence discovery mechanism (LIDD) to identify abnormal behavior in multi-system environments. The approach employs a multivariate analysis technique that first estimates the similarity heatmaps among the sensors for each system and then applies information retrieval algorithms to provide relevant multi-level interconnection and discrepancy details. Our experiment on the readout systems of the Hadron Calorimeter of the Compact Muon Solenoid (CMS) experiment at CERN demonstrates the effectiveness of the proposed method. Our approach clusters readout systems and their sensors consistent with the expected calorimeter interconnection configurations, while capturing unusual behavior in divergent clusters and estimating their root causes.
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Authors: Tianyu Zhu, Myong Chol Jung, Jesse Clark
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular contrastive frameworks typically learn from binary relevance, making them ineffective at incorporating direct fine-grained rankings. In this paper, we curate a large-scale dataset featuring detailed relevance scores for each query-document pair to facilitate future research and evaluation. Subsequently, we propose Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking (GCL), which is designed to learn from fine-grained rankings beyond binary relevance scores. Our results show that GCL achieves a 94.5% increase in NDCG@10 for in-domain and 26.3 to 48.8% increases for cold-start evaluations, all relative to the CLIP baseline and involving ground truth rankings.
Accessibility in Information Retrieval
Authors: Leif Azzopardi, Vishwa Vinay
Subjects: Information Retrieval (cs.IR); Digital Libraries (cs.DL)
Abstract
This paper introduces the concept of accessibility from the field of transportation planning and adopts it within the context of Information Retrieval (IR). An analogy is drawn between the fields, which motivates the development of document accessibility measures for IR systems. Considering the accessibility of documents within a collection given an IR System provides a different perspective on the analysis and evaluation of such systems which could be used to inform the design, tuning and management of current and future IR systems.
Keyword: video retrieval
There is no result
Keyword: mobile
Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking
Authors: Igor G. Smit, Zaharah Bukhsh, Mykola Pechenizkiy, Kostas Alogariastos, Kasper Hendriks, Yingqian Zhang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
In collaborative human-robot order picking systems, human pickers and Autonomous Mobile Robots (AMRs) travel independently through a warehouse and meet at pick locations where pickers load items onto the AMRs. In this paper, we consider an optimization problem in such systems where we allocate pickers to AMRs in a stochastic environment. We propose a novel multi-objective Deep Reinforcement Learning (DRL) approach to learn effective allocation policies to maximize pick efficiency while also aiming to improve workload fairness amongst human pickers. In our approach, we model the warehouse states using a graph, and define a neural network architecture that captures regional information and effectively extracts representations related to efficiency and workload. We develop a discrete-event simulation model, which we use to train and evaluate the proposed DRL approach. In the experiments, we demonstrate that our approach can find non-dominated policy sets that outline good trade-offs between fairness and efficiency objectives. The trained policies outperform the benchmarks in terms of both efficiency and fairness. Moreover, they show good transferability properties when tested on scenarios with different warehouse sizes. The implementation of the simulation model, proposed approach, and experiments are published.
Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems
Authors: Chetna Singhal, Yashuo Wu, Francesco Malandrino, Marco Levorato, Carla Fabiana Chiasserini
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
The increasing pervasiveness of intelligent mobile applications requires to exploit the full range of resources offered by the mobile-edge-cloud network for the execution of inference tasks. However, due to the heterogeneity of such multi-tiered networks, it is essential to make the applications' demand amenable to the available resources while minimizing energy consumption. Modern dynamic deep neural networks (DNN) achieve this goal by designing multi-branched architectures where early exits enable sample-based adaptation of the model depth. In this paper, we tackle the problem of allocating sections of DNNs with early exits to the nodes of the mobile-edge-cloud system. By envisioning a 3-stage graph-modeling approach, we represent the possible options for splitting the DNN and deploying the DNN blocks on the multi-tiered network, embedding both the system constraints and the application requirements in a convenient and efficient way. Our framework -- named Feasible Inference Graph (FIN) -- can identify the solution that minimizes the overall inference energy consumption while enabling distributed inference over the multi-tiered network with the target quality and latency. Our results, obtained for DNNs with different levels of complexity, show that FIN matches the optimum and yields over 65% energy savings relative to a state-of-the-art technique for cost minimization.
Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot
Abstract
Object manipulation has been extensively studied in the context of fixed base and mobile manipulators. However, the overactuated locomotion modality employed by snake robots allows for a unique blend of object manipulation through locomotion, referred to as loco-manipulation. The following work presents an optimization approach to solving the loco-manipulation problem based on non-impulsive implicit contact path planning for our snake robot COBRA. We present the mathematical framework and show high-fidelity simulation results and experiments to demonstrate the effectiveness of our approach.
QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
Authors: Wenhao Yuan, Xuehe Wang
Subjects: Computer Science and Game Theory (cs.GT)
Abstract
Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover's Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.
Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty
Authors: Peijie Sun, Yifan Wang, Min Zhang, Chuhan Wu, Yan Fang, Hong Zhu, Yuan Fang, Meng Wang
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract
With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extremes, ensuring stability in the modeling process. Within this framework, we introduce a collaborative-enhanced model designed to predict user game spending without relying on user IDs, thus ensuring user privacy and enabling seamless online training. Our model adopts a unique approach by separately representing user preferences and game features before merging them as input to the spending prediction module. Through rigorous experimentation, our approach demonstrates notable improvements over production models, achieving a remarkable \textbf{17.11}\% enhancement on offline data and an impressive \textbf{50.65}\% boost in an online A/B test. In summary, our contributions underscore the importance of stable model training frameworks and the efficacy of collaborative-enhanced models in predicting user spending behavior in mobile gaming.
Efficient Sensors Selection for Traffic Flow Monitoring: An Overview of Model-Based Techniques leveraging Network Observability
Authors: Marco Fabris, Riccardo Ceccato, Andrea Zanella
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Abstract
The emergence of 6G-enabled Internet of Vehicles (IoV) promises to revolutionize mobility and connectivity, integrating vehicles into a mobile Internet-of-Things (IoT)-oriented wireless sensor network (WSN). 5G technologies and mobile edge computing further support this vision by facilitating real-time connectivity and empowering massive access to the Internet. In this context, IoT-oriented WSNs play a crucial role in intelligent transportation systems, offering affordable alternatives for traffic monitoring and management. This paper's contribution is twofold: (i) surveying state-of-the-art model-based techniques for efficient sensor selection in traffic flow monitoring, emphasizing challenges of sensor placement; and (ii) advocating for data-driven methodologies to enhance sensor deployment efficacy and traffic modeling accuracy. Further considerations underscore the importance of data-driven approaches for adaptive transportation systems aligned with the IoV paradigm.
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Abstract
Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D). Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. The proposed EE3D framework is specifically tailored for learning with event streams in the LNES representation, enabling high 3D reconstruction accuracy. We also design a prototype of a mobile head-mounted device with an event camera and record a real dataset with event observations and the ground-truth 3D human poses (in addition to the synthetic dataset). Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz.
Keyword: volume render
There is no result
Keyword: volumetric render
There is no result
Keyword: remote render
There is no result
Keyword: hybrid render
There is no result
Keyword: raycast
There is no result
Keyword: medical imaging
An inclusive review on deep learning techniques and their scope in handwriting recognition
Keyword: medical visualization
There is no result
Keyword: interactive volume
There is no result
Keyword: rendering
Evaluation Framework for Quantum Security Risk Assessment: A Comprehensive Study for Quantum-Safe Migration
GPN: Generative Point-based NeRF
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
Overview of the TREC 2023 NeuCLIR Track
Extending Translate-Train for ColBERT-X to African Language CLIR
Generative Information Retrieval Evaluation
Reducing hallucination in structured outputs via Retrieval-Augmented Generation
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Lightweight Multi-System Multivariate Interconnection and Divergence Discovery
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Accessibility in Information Retrieval
Keyword: video retrieval
There is no result
Keyword: mobile
Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking
Resource-aware Deployment of Dynamic DNNs over Multi-tiered Interconnected Systems
Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot
QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty
Efficient Sensors Selection for Traffic Flow Monitoring: An Overview of Model-Based Techniques leveraging Network Observability
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Keyword: smartphone
There is no result
Keyword: medical volume data
There is no result