Abstract
Significance testing -- especially the paired-permutation test -- has played a vital role in developing NLP systems to provide confidence that the difference in performance between two systems (i.e., the test statistic) is not due to luck. However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm. In this paper, we provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics. Our algorithm runs in $\mathcal{O}(GN (\log GN )(\log N ))$ time where $N$ is the dataset size and $G$ is the range of the test statistic. We found that our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
Semantic Diversity in Dialogue with Natural Language Inference
Abstract
Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation. We evaluate this metric using an established framework (Tevet and Berant, 2021) and find strong evidence indicating NLI Diversity is correlated with semantic diversity. Specifically, we show that the contradiction relation is more useful than the neutral relation for measuring this diversity and that incorporating the NLI model's confidence achieves state-of-the-art results. Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation, which results in an average 137% increase in NLI Diversity compared to standard generation procedures.
Privacy Amplification via Random Participation in Federated Learning
Authors: Burak Hasircioglu, Deniz Gunduz
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Running a randomized algorithm on a subsampled dataset instead of the entire dataset amplifies differential privacy guarantees. In this work, in a federated setting, we consider random participation of the clients in addition to subsampling their local datasets. Since such random participation of the clients creates correlation among the samples of the same client in their subsampling, we analyze the corresponding privacy amplification via non-uniform subsampling. We show that when the size of the local datasets is small, the privacy guarantees via random participation is close to those of the centralized setting, in which the entire dataset is located in a single host and subsampled. On the other hand, when the local datasets are large, observing the output of the algorithm may disclose the identities of the sampled clients with high confidence. Our analysis reveals that, even in this case, privacy guarantees via random participation outperform those via only local subsampling.
Keyword: scaling
Cost-Aware Comparison of LiDAR-based 3D Object Detectors
Authors: Xiaofang Wang, Kris M. Kitani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Considerable research efforts have been devoted to LiDAR-based 3D object detection and its empirical performance has been significantly improved. While the progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e.g., inference latency. This makes it difficult to quantify the true performance gain brought by recently proposed architecture designs. The goal of this work is to conduct a fair comparison of LiDAR-based 3D object detectors. Specifically, we focus on SECOND, a simple grid-based one-stage detector, and analyze its performance under different costs by scaling its original architecture. Then we compare the family of scaled SECOND with recent 3D detection methods, such as Voxel R-CNN and PV-RCNN++. The results are surprising. We find that, if allowed to use the same latency, SECOND can match the performance of PV-RCNN++, the current state-of-the-art method on the Waymo Open Dataset. Scaled SECOND also easily outperforms many recent 3D detection methods published during the past year. We recommend future research control the inference cost in their empirical comparison and include the family of scaled SECOND as a strong baseline when presenting novel 3D detection methods.
Keyword: calibration
Cross-View Cross-Scene Multi-View Crowd Counting
Authors: Qi Zhang, Wei Lin, Antoni B. Chan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts. To dynamically handle the challenge of optimal view fusion under scene and camera layout change and non-correspondence noise due to camera calibration errors or erroneous features, we propose a CVCS model that attentively selects and fuses multiple views together using camera layout geometry, and a noise view regularization method to train the model to handle non-correspondence errors. We also generate a large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. We then test our trained CVCS model on real multi-view counting datasets, by using unsupervised domain transfer. The proposed CVCS model trained on synthetic data outperforms the same model trained only on real data, and achieves promising performance compared to fully supervised methods that train and test on the same single scene.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Exact Paired-Permutation Testing for Structured Test Statistics
Semantic Diversity in Dialogue with Natural Language Inference
Privacy Amplification via Random Participation in Federated Learning
Keyword: scaling
Cost-Aware Comparison of LiDAR-based 3D Object Detectors
Keyword: calibration
Cross-View Cross-Scene Multi-View Crowd Counting