Abstract
We present a deep learning approximation, stochastic optimization based, method for wave kinetic equations. To build confidence in our approach, we apply the method to a Smoluchowski coagulation equation with multiplicative kernel for which an analytic solution exists. Our deep learning approach is then used to approximate the non-stationary solution to a 3-wave kinetic equation corresponding to acoustic wave systems. To validate the neural network approximation, we compare the decay rate of the total energy with previously obtained theoretical results. A finite volume solution is presented and compared with the present method.
Improving Image Clustering through Sample Ranking and Its Application to remote--sensing images
Abstract
Image clustering is a very useful technique that is widely applied to various areas, including remote sensing. Recently, visual representations by self-supervised learning have greatly improved the performance of image clustering. To further improve the well-trained clustering models, this paper proposes a novel method by first ranking samples within each cluster based on the confidence in their belonging to the current cluster and then using the ranking to formulate a weighted cross-entropy loss to train the model. For ranking the samples, we developed a method for computing the likelihood of samples belonging to the current clusters based on whether they are situated in densely populated neighborhoods, while for training the model, we give a strategy for weighting the ranked samples. We present extensive experimental results that demonstrate that the new technique can be used to improve the State-of-the-Art image clustering models, achieving accuracy performance gains ranging from $2.1\%$ to $15.9\%$. Performing our method on a variety of datasets from remote sensing, we show that our method can be effectively applied to remote--sensing images.
Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization
Authors: Jingyang Lin, Yu Wang, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Outlier detection tasks have been playing a critical role in AI safety. There has been a great challenge to deal with this task. Observations show that deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence. Existing works attempt to solve the problem by explicitly imposing uncertainty on classifiers when OOD inputs are exposed to the classifier during training. In this paper, we propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks. Particularly, we impose statistical independence between inlier and outlier data during training, in order to ensure that inlier data reveals little information about OOD data to the deep estimator during training. Specifically, we estimate the statistical dependence between inlier and outlier data through the Hilbert-Schmidt Independence Criterion (HSIC), and we penalize such metric during training. We also associate our approach with a novel statistical test during the inference time coupled with our principled motivation. Empirical results show that our method is effective and robust for OOD detection on various benchmarks. In comparison to SOTA models, our approach achieves significant improvement regarding FPR95, AUROC, and AUPR metrics. Code is available: \url{https://github.com/jylins/hood}.
Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps
Abstract
Multi-agent collaborative perception could significantly upgrade the perception performance by enabling agents to share complementary information with each other through communication. It inevitably results in a fundamental trade-off between perception performance and communication bandwidth. To tackle this bottleneck issue, we propose a spatial confidence map, which reflects the spatial heterogeneity of perceptual information. It empowers agents to only share spatially sparse, yet perceptually critical information, contributing to where to communicate. Based on this novel spatial confidence map, we propose Where2comm, a communication-efficient collaborative perception framework. Where2comm has two distinct advantages: i) it considers pragmatic compression and uses less communication to achieve higher perception performance by focusing on perceptually critical areas; and ii) it can handle varying communication bandwidth by dynamically adjusting spatial areas involved in communication. To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/LiDAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original CoPerception-UAVs. Where2comm consistently outperforms previous methods; for example, it achieves more than $100,000 \times$ lower communication volume and still outperforms DiscoNet and V2X-ViT on OPV2V. Our code is available at https://github.com/MediaBrain-SJTU/where2comm.
Keyword: scaling
WordStream Maker: A Lightweight End-to-end Visualization Platform for Qualitative Time-series Data
Authors: Huyen N. Nguyen, Tommy Dang, Kathleen A. Bowe
Abstract
Whether it is in the form of transcribed conversations, blog posts, or tweets, qualitative data provides a reader with rich insight into both the overarching trends as well as the diversity of human ideas expressed through text. Handling and analyzing large amounts of qualitative data, however, is difficult, often requiring multiple time-intensive perusals in order to identify patterns. This difficulty is multiplied with each additional question or time point present in a data set. A primary challenge then is creating visualizations that support the interpretation of qualitative data by making it easier to identify and explore trends of interest. By combining the affordances of both text and visualizations, WordStream has previously enabled ease of information retrieval and processing of time-series text data, but the data-wrangling necessary to produce a WordStream remains a significant barrier for non-technical users. In response, this paper presents WordStream Maker: an end-to-end platform with a pipeline that utilizes natural language processing (NLP) to help non-technical users process raw text data and generate a customizable visualization without programming practice. Lessons learned from integrating NLP into visualization and scaling to large data sets are discussed, along with use cases to demonstrate the usefulness of the platform.
A Simple Strategy to Provable Invariance via Orbit Mapping
Authors: Kanchana Vaishnavi Gandikota, Jonas Geiping, Zorah Lähner, Adam Czapliński, Michael Moeller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Many applications require robustness, or ideally invariance, of neural networks to certain transformations of input data. Most commonly, this requirement is addressed by training data augmentation, using adversarial training, or defining network architectures that include the desired invariance by design. In this work, we propose a method to make network architectures provably invariant with respect to group actions by choosing one element from a (possibly continuous) orbit based on a fixed criterion. In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network. Further, we empirically analyze the properties of different approaches which incorporate invariance via training or architecture, and demonstrate the advantages of our method in terms of robustness and computational efficiency. In particular, we investigate the robustness with respect to rotations of images (which can hold up to discretization artifacts) as well as the provable orientation and scaling invariance of 3D point cloud classification.
Climate Impact Modelling Framework
Authors: Blair Edwards, Paolo Fraccaro, Nikola Stoyanov, Nelson Bore, Julian Kuehnert, Tommy Weldemariam, Anne Jones
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Abstract
The application of models to assess the risk of the physical impacts of weather and climate and their subsequent consequences for society and business is of the utmost importance in our changing climate. The operation of such models is historically bespoke and constrained to specific compute infrastructure, driving datasets and predefined configurations. These constraints introduce challenges with scaling model runs and putting the models in the hands of interested users. Here we present a cloud-based modular framework for the deployment and operation of geospatial models, initially applied to climate impacts. The Climate Impact Modelling Frameworks (CIMF) enables the deployment of modular workflows in a dynamic and flexible manner. Users can specify workflow components in a streamlined manner, these components can then be easily organised into different configurations to assess risk in different ways and at different scales. This also enables different models (physical simulation or machine learning models) and workflows to be connected to produce combined risk assessment. Flood modelling is used as an end-to-end example to demonstrate the operation of CIMF.
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Abstract
Previous work has shown that there exists a scaling law between the size of Language Models (LMs) and their zero-shot performance on different downstream NLP tasks. In this work, we show that this phenomenon does not hold when evaluating large LMs on tasks with negated prompts, but instead shows an inverse scaling law. We evaluate 9 different tasks with negated prompts on (1) pretrained LMs (OPT & GPT-3) of varying sizes (125M - 175B), (2) LMs further pretrained to generalize to novel prompts (InstructGPT), (3) LMs provided with few-shot examples, and (4) LMs fine-tuned specifically on negated prompts; all LM types perform worse on negated prompts as they scale and show a huge performance gap between the human performance when comparing the average score on both original and negated prompts. By highlighting a critical limitation of existing LMs and methods, we urge the community to develop new approaches of developing LMs that actually follow the given instructions. We provide the code and the datasets to explore negated prompts at https://github.com/joeljang/negated-prompts-for-llms
Learning to Learn with Generative Models of Neural Network Checkpoints
Authors: William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired metric. At test time, it can optimize neural networks with unseen parameters for downstream tasks in just one update. We find that our approach successfully generates parameters for a wide range of loss prompts. Moreover, it can sample multimodal parameter solutions and has favorable scaling properties. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
Keyword: calibration
Overcoming Bias: Equivariant Filter Design for Biased Attitude Estimation with Online Calibration
Authors: Alessandro Fornasier, Yonhon Ng, Christian Brommer, Christoph Böhm, Robert Mahony, Stephan Weiss
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Stochastic filters for on-line state estimation are a core technology for autonomous systems. The performance of such filters is one of the key limiting factors to a system's capability. Both asymptotic behavior (e.g.,~for regular operation) and transient response (e.g.,~for fast initialization and reset) of such filters are of crucial importance in guaranteeing robust operation of autonomous systems. This paper introduces a new generic formulation for a gyroscope aided attitude estimator using N direction measurements including both body-frame and reference-frame direction type measurements. The approach is based on an integrated state formulation that incorporates navigation, extrinsic calibration for all direction sensors, and gyroscope bias states in a single equivariant geometric structure. This newly proposed symmetry allows modular addition of different direction measurements and their extrinsic calibration while maintaining the ability to include bias states in the same symmetry. The subsequently proposed filter-based estimator using this symmetry noticeably improves the transient response, and the asymptotic bias and extrinsic calibration estimation compared to state-of-the-art approaches. The estimator is verified in statistically representative simulations and is tested in real-world experiments.
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion
Abstract
LiDAR and cameras are two complementary sensors for 3D perception in autonomous driving. LiDAR point clouds have accurate spatial and geometry information, while RGB images provide textural and color data for context reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration, namely one-to-one mapping. However, the performance of these approaches highly relies on the calibration quality, which is sensitive to the temporal and spatial synchronization of sensors. Therefore, we propose a Dynamic Cross Attention (DCA) module with a novel one-to-many cross-modality mapping that learns multiple offsets from the initial projection towards the neighborhood and thus develops tolerance to calibration error. Moreover, a \textit{dynamic query enhancement} is proposed to perceive the model-independent calibration, which further strengthens DCA's tolerance to the initial misalignment. The whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits multi-level image features and adapts to multiple representations of point clouds, which allows DCA to serve as a plug-in fusion module. Extensive experiments on nuScenes and KITTI prove DCA's effectiveness. The proposed DCAN outperforms state-of-the-art methods on the nuScenes detection challenge.
An optimization-based IMU/Lidar/Camera Co-calibration method
Abstract
Recently, multi-sensors fusion has achieved significant progress in the field of automobility to improve navigation and position performance. As the prerequisite of the fusion algorithm, the demand for the extrinsic calibration of multi-sensors is growing. To calculate the extrinsic parameter, many researches have been dedicated to the two-step method, which integrates the respective calibration in pairs. It is inefficient and incompact because of losing sight of the constrain of all sensors. With regard to remove this burden, an optimization-based IMU/Lidar/Camera co-calibration method is proposed in the paper. Firstly, the IMU/camera and IMU/lidar online calibrations are conducted, respectively. Then, the corner and surface feature points in the chessboard are associated with the coarse result and the camera/lidar constraint is constructed. Finally, construct the co-calibration optimization to refine all extrinsic parameters. We evaluate the performance of the proposed scheme in simulation and the result demonstrates that our proposed method outperforms the two-step method.
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
Authors: Dávid Sztahó, Attila Fejes
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract
In forensic voice comparison the speaker embedding has become widely popular in the last 10 years. Most of the pretrained speaker embeddings are trained on English corpora, because it is easily accessible. Thus, language dependency can be an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different. There are numerous commercial systems available, but their models are mainly trained on a different language (mostly English) than the target language. In the case of a low-resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre-trained on English corpus can be used on a target low-resource language (here, Hungarian), different from the model is trained on. Also, often multiple samples are not available from the offender (unknown speaker). Therefore, samples are compared pairwise with and without speaker enrollment for suspect (known) speakers. Two corpora are applied that were developed especially for forensic purposes, and a third that is meant for traditional speaker verification. Two deep learning based speaker embedding vector extraction methods are used: the x-vector and ECAPA-TDNN. Speaker verification was evaluated in the likelihood-ratio framework. A comparison is made between the language combinations (modeling, LR calibration, evaluation). The results were evaluated by minCllr and EER metrics. It was found that the model pre-trained on a different language but on a corpus with a huge amount of speakers performs well on samples with language mismatch. The effect of sample durations and speaking styles were also examined. It was found that the longer the duration of the sample in question the better the performance is. Also, there is no real difference if various speaking styles are applied.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
A Deep Learning Approximation of Non-Stationary Solutions to Wave Kinetic Equations
Improving Image Clustering through Sample Ranking and Its Application to remote--sensing images
Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization
Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps
Keyword: scaling
WordStream Maker: A Lightweight End-to-end Visualization Platform for Qualitative Time-series Data
A Simple Strategy to Provable Invariance via Orbit Mapping
Climate Impact Modelling Framework
Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Learning to Learn with Generative Models of Neural Network Checkpoints
Keyword: calibration
Overcoming Bias: Equivariant Filter Design for Biased Attitude Estimation with Online Calibration
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion
An optimization-based IMU/Lidar/Camera Co-calibration method
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings