Abstract
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published.
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Abstract
Existing prompt learning methods have shown certain capabilities in Out-of-Distribution (OOD) detection, but the lack of OOD images in the target dataset in their training can lead to mismatches between OOD images and In-Distribution (ID) categories, resulting in a high false positive rate. To address this issue, we introduce a novel OOD detection method, named 'NegPrompt', to learn a set of negative prompts, each representing a negative connotation of a given class label, for delineating the boundaries between ID and OOD images. It learns such negative prompts with ID data only, without any reliance on external outlier data. Further, current methods assume the availability of samples of all ID classes, rendering them ineffective in open-vocabulary learning scenarios where the inference stage can contain novel ID classes not present during training. In contrast, our learned negative prompts are transferable to novel class labels. Experiments on various ImageNet benchmarks show that NegPrompt surpasses state-of-the-art prompt-learning-based OOD detection methods and maintains a consistent lead in hard OOD detection in closed- and open-vocabulary classification scenarios. Code is available at https://github.com/mala-lab/negprompt.
RADIUM: Predicting and Repairing End-to-End Robot Failures using Gradient-Accelerated Sampling
Authors: Charles Dawson, Anjali Parashar, Chuchu Fan
Abstract
Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design and control policy to preemptively mitigate those failures. Existing tools for failure prediction struggle to search over high-dimensional environmental parameters, cannot efficiently handle end-to-end testing for systems with vision in the loop, and provide little guidance on how to mitigate failures once they are discovered. We approach this problem through the lens of approximate Bayesian inference and use differentiable simulation and rendering for efficient failure case prediction and repair. For cases where a differentiable simulator is not available, we provide a gradient-free version of our algorithm, and we include a theoretical and empirical evaluation of the trade-offs between gradient-based and gradient-free methods. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms, UAV formation control, and robust network control. Compared to optimization-based falsification methods, our method predicts a more diverse, representative set of failure modes, and we find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques. In hardware experiments, we find that repairing control policies using our method leads to a 5x robustness improvement. Accompanying code and video can be found at https://mit-realm.github.io/radium/
Abstract
NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration. To overcome this limitation, we propose a generic radiance fields restoration pipeline, named RaFE, which applies to various types of degradations, such as low resolution, blurriness, noise, compression artifacts, or their combinations. Our approach leverages the success of off-the-shelf 2D restoration methods to recover the multi-view images individually. Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images. Specifically, we adopt a two-level tri-plane architecture, where the coarse level remains fixed to represent the low-quality NeRF, and a fine-level residual tri-plane to be added to the coarse level is modeled as a distribution with GAN to capture potential variations in restoration. We validate RaFE on both synthetic and real cases for various restoration tasks, demonstrating superior performance in both quantitative and qualitative evaluations, surpassing other 3D restoration methods specific to single task. Please see our project website https://zkaiwu.github.io/RaFE-Project/.
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Authors: Jiawei Zhang, Chejian Xu, Yu Gai, Freddy Lecue, Dawn Song, Bo Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
This paper introduces KnowHalu, a novel approach for detecting hallucinations in text generated by large language models (LLMs), utilizing step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism. As LLMs are increasingly applied across various domains, ensuring that their outputs are not hallucinated is critical. Recognizing the limitations of existing approaches that either rely on the self-consistency check of LLMs or perform post-hoc fact-checking without considering the complexity of queries or the form of knowledge, KnowHalu proposes a two-phase process for hallucination detection. In the first phase, it identifies non-fabrication hallucinations--responses that, while factually correct, are irrelevant or non-specific to the query. The second phase, multi-form based factual checking, contains five key steps: reasoning and query decomposition, knowledge retrieval, knowledge optimization, judgment generation, and judgment aggregation. Our extensive evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks, e.g., improving by 15.65% in QA tasks and 5.50% in summarization tasks, highlighting its effectiveness and versatility in detecting hallucinations in LLM-generated content.
Towards Standards-Compliant Assistive Technology Product Specifications via LLMs
Authors: Chetan Arora, John Grundy, Louise Puli, Natasha Layton
Abstract
In the rapidly evolving field of assistive technology (AT), ensuring that products meet national and international standards is essential for user safety, efficacy, and accessibility. In this vision paper, we introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications with these standards through the innovative use of Large Language Models (LLMs). CompliAT addresses three critical tasks: checking terminology consistency, classifying products according to standards, and tracing key product specifications to standard requirements. We tackle the challenge of terminology consistency to ensure that the language used in product specifications aligns with relevant standards, reducing misunderstandings and non-compliance risks. We propose a novel approach for product classification, leveraging a retrieval-augmented generation model to accurately categorize AT products aligning to international standards, despite the sparse availability of training data. Finally, CompliAT implements a traceability and compliance mechanism from key product specifications to standard requirements, ensuring all aspects of an AT product are thoroughly vetted against the corresponding standards. By semi-automating these processes, CompliAT aims to significantly reduce the time and effort required for AT product standards compliance and uphold quality and safety standards. We outline our planned implementation and evaluation plan for CompliAT.
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
Abstract
The integration of Large Language Models (LLMs) in information retrieval has raised a critical reevaluation of fairness in the text-ranking models. LLMs, such as GPT models and Llama2, have shown effectiveness in natural language understanding tasks, and prior works (e.g., RankGPT) have also demonstrated that the LLMs exhibit better performance than the traditional ranking models in the ranking task. However, their fairness remains largely unexplored. This paper presents an empirical study evaluating these LLMs using the TREC Fair Ranking dataset, focusing on the representation of binary protected attributes such as gender and geographic location, which are historically underrepresented in search outcomes. Our analysis delves into how these LLMs handle queries and documents related to these attributes, aiming to uncover biases in their ranking algorithms. We assess fairness from both user and content perspectives, contributing an empirical benchmark for evaluating LLMs as the fair ranker.
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
Abstract
By leveraging the retrieval of information from external knowledge databases, Large Language Models (LLMs) exhibit enhanced capabilities for accomplishing many knowledge-intensive tasks. However, due to the inherent flaws of current retrieval systems, there might exist irrelevant information within those retrieving top-ranked passages. In this work, we present a comprehensive investigation into the robustness of LLMs to different types of irrelevant information under various conditions. We initially introduce a framework to construct high-quality irrelevant information that ranges from semantically unrelated, partially related, and related to questions. Furthermore, our analysis demonstrates that the constructed irrelevant information not only scores highly on similarity metrics, being highly retrieved by existing systems, but also bears semantic connections to the context. Our investigation reveals that current LLMs still face challenges in discriminating highly semantically related information and can be easily distracted by these irrelevant yet misleading contents. Besides, we also find that current solutions for handling irrelevant information have limitations in improving the robustness of LLMs to such distractions. Resources are available at https://github.com/Di-viner/LLM-Robustness-to-Irrelevant-Information.
Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach
Abstract
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Despite their great success, the knowledge provided by the retrieval process is not always useful for improving the model prediction, since in some samples LLMs may already be quite knowledgeable and thus be able to answer the question correctly without retrieval. Aiming to save the cost of retrieval, previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data. However, these data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data. Moreover, these methods offer limited adaptability under fine-tuning or continual learning settings. We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data. Moreover, it alleviates the need to retain all the data utilized during model pre-training, necessitating only the upkeep of the token embeddings. Extensive experiments and in-depth analyses demonstrate the superiority of our model-aware approach.
BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering
Authors: Azmine Toushik Wasi, Taki Hasan Rafi, Raima Islam, Dong-Kyu Chae
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Social and Information Networks (cs.SI)
Abstract
Knowledge Graphs (KGs) have proven essential in information processing and reasoning applications because they link related entities and give context-rich information, supporting efficient information retrieval and knowledge discovery; presenting information flow in a very effective manner. Despite being widely used globally, Bangla is relatively underrepresented in KGs due to a lack of comprehensive datasets, encoders, NER (named entity recognition) models, POS (part-of-speech) taggers, and lemmatizers, hindering efficient information processing and reasoning applications in the language. Addressing the KG scarcity in Bengali, we propose BanglaAutoKG, a pioneering framework that is able to automatically construct Bengali KGs from any Bangla text. We utilize multilingual LLMs to understand various languages and correlate entities and relations universally. By employing a translation dictionary to identify English equivalents and extracting word features from pre-trained BERT models, we construct the foundational KG. To reduce noise and align word embeddings with our goal, we employ graph-based polynomial filters. Lastly, we implement a GNN-based semantic filter, which elevates contextual understanding and trims unnecessary edges, culminating in the formation of the definitive KG. Empirical findings and case studies demonstrate the universal effectiveness of our model, capable of autonomously constructing semantically enriched KGs from any text.
Keyword: video retrieval
There is no result
Keyword: mobile
Towards Collaborative Family-Centered Design for Online Safety, Privacy and Security
Abstract
Traditional online safety technologies often overly restrict teens and invade their privacy, while parents often lack knowledge regarding their digital privacy. As such, prior researchers have called for more collaborative approaches on adolescent online safety and networked privacy. In this paper, we propose family-centered approaches to foster parent-teen collaboration in ensuring their mobile privacy and online safety while respecting individual privacy, to enhance open discussion and teens' self-regulation. However, challenges such as power imbalances and conflicts with family values arise when implementing such approaches, making parent-teen collaboration difficult. Therefore, attending the family-centered design workshop will provide an invaluable opportunity for us to discuss these challenges and identify best research practices for the future of collaborative online safety and privacy within families.
Real-time Noise Source Estimation of a Camera System from an Image and Metadata
Authors: Maik Wischow, Patrick Irmisch, Anko Boerner, Guillermo Gallego
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Abstract
Autonomous machines must self-maintain proper functionality to ensure the safety of humans and themselves. This pertains particularly to its cameras as predominant sensors to perceive the environment and support actions. A fundamental camera problem addressed in this study is noise. Solutions often focus on denoising images a posteriori, that is, fighting symptoms rather than root causes. However, tackling root causes requires identifying the noise sources, considering the limitations of mobile platforms. This work investigates a real-time, memory-efficient and reliable noise source estimator that combines data- and physically-based models. To this end, a DNN that examines an image with camera metadata for major camera noise sources is built and trained. In addition, it quantifies unexpected factors that impact image noise or metadata. This study investigates seven different estimators on six datasets that include synthetic noise, real-world noise from two camera systems, and real field campaigns. For these, only the model with most metadata is capable to accurately and robustly quantify all individual noise contributions. This method outperforms total image noise estimators and can be plug-and-play deployed. It also serves as a basis to include more advanced noise sources, or as part of an automatic countermeasure feedback-loop to approach fully reliable machines.
Fusion of Mixture of Experts and Generative Artificial Intelligence in Mobile Edge Metaverse
Authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Shiwen Mao, Dong In Kim
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
In the digital transformation era, Metaverse offers a fusion of virtual reality (VR), augmented reality (AR), and web technologies to create immersive digital experiences. However, the evolution of the Metaverse is slowed down by the challenges of content creation, scalability, and dynamic user interaction. Our study investigates an integration of Mixture of Experts (MoE) models with Generative Artificial Intelligence (GAI) for mobile edge computing to revolutionize content creation and interaction in the Metaverse. Specifically, we harness an MoE model's ability to efficiently manage complex data and complex tasks by dynamically selecting the most relevant experts running various sub-models to enhance the capabilities of GAI. We then present a novel framework that improves video content generation quality and consistency, and demonstrate its application through case studies. Our findings underscore the efficacy of MoE and GAI integration to redefine virtual experiences by offering a scalable, efficient pathway to harvest the Metaverse's full potential.
NMF-Based Analysis of Mobile Eye-Tracking Data
Authors: Daniel Klötzl, Tim Krake, Frank Heyen, Michael Becher, Maurice Koch, Daniel Weiskopf, Kuno Kurzhals
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The depiction of scanpaths from mobile eye-tracking recordings by thumbnails from the stimulus allows the application of visual computing to detect areas of interest in an unsupervised way. We suggest using nonnegative matrix factorization (NMF) to identify such areas in stimuli. For a user-defined integer k, NMF produces an explainable decomposition into k components, each consisting of a spatial representation associated with a temporal indicator. In the context of multiple eye-tracking recordings, this leads to k spatial representations, where the temporal indicator highlights the appearance within recordings. The choice of k provides an opportunity to control the refinement of the decomposition, i.e., the number of areas to detect. We combine our NMF-based approach with visualization techniques to enable an exploratory analysis of multiple recordings. Finally, we demonstrate the usefulness of our approach with mobile eye-tracking data of an art gallery.
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Authors: Adam Pardyl, Michał Wronka, Maciej Wołczyk, Kamil Adamczewski, Tomasz Trzciński, Bartosz Zieliński
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Active Visual Exploration (AVE) is a task that involves dynamically selecting observations (glimpses), which is critical to facilitate comprehension and navigation within an environment. While modern AVE methods have demonstrated impressive performance, they are constrained to fixed-scale glimpses from rigid grids. In contrast, existing mobile platforms equipped with optical zoom capabilities can capture glimpses of arbitrary positions and scales. To address this gap between software and hardware capabilities, we introduce AdaGlimpse. It uses Soft Actor-Critic, a reinforcement learning algorithm tailored for exploration tasks, to select glimpses of arbitrary position and scale. This approach enables our model to rapidly establish a general awareness of the environment before zooming in for detailed analysis. Experimental results demonstrate that AdaGlimpse surpasses previous methods across various visual tasks while maintaining greater applicability in realistic AVE scenarios.
Keyword: volume render
There is no result
Keyword: volumetric render
There is no result
Keyword: remote render
There is no result
Keyword: hybrid render
There is no result
Keyword: raycast
There is no result
Keyword: medical imaging
There is no result
Keyword: medical visualization
There is no result
Keyword: interactive volume
There is no result
Keyword: rendering
OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Learning Transferable Negative Prompts for Out-of-Distribution Detection
RADIUM: Predicting and Repairing End-to-End Robot Failures using Gradient-Accelerated Sampling
RaFE: Generative Radiance Fields Restoration
Keyword: cinematic rendering
There is no result
Keyword: volume data
There is no result
Keyword: remote visualization
There is no result
Keyword: direct volume rendering
There is no result
Keyword: mobile device
There is no result
Keyword: transfer function
There is no result
Keyword: retrieval
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Towards Standards-Compliant Assistive Technology Product Specifications via LLMs
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach
BanglaAutoKG: Automatic Bangla Knowledge Graph Construction with Semantic Neural Graph Filtering
Keyword: video retrieval
There is no result
Keyword: mobile
Towards Collaborative Family-Centered Design for Online Safety, Privacy and Security
Real-time Noise Source Estimation of a Camera System from an Image and Metadata
Fusion of Mixture of Experts and Generative Artificial Intelligence in Mobile Edge Metaverse
NMF-Based Analysis of Mobile Eye-Tracking Data
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Keyword: smartphone
There is no result
Keyword: medical volume data
There is no result