New submissions for Wed, 4 May 22

Keyword: out of distribution detection

There is no result

Keyword: out-of-distribution detection

There is no result

Keyword: expected calibration error

There is no result

Keyword: overconfident

There is no result

Keyword: overconfidence

There is no result

Keyword: confidence

Exact Paired-Permutation Testing for Structured Test Statistics

Authors: Ran Zmigrod, Tim Vieira, Ryan Cotterell
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2205.01416
Pdf link: https://arxiv.org/pdf/2205.01416
Abstract Significance testing -- especially the paired-permutation test -- has played a vital role in developing NLP systems to provide confidence that the difference in performance between two systems (i.e., the test statistic) is not due to luck. However, practitioners rely on Monte Carlo approximation to perform this test due to a lack of a suitable exact algorithm. In this paper, we provide an efficient exact algorithm for the paired-permutation test for a family of structured test statistics. Our algorithm runs in $\mathcal{O}(GN (\log GN )(\log N ))$ time where $N$ is the dataset size and $G$ is the range of the test statistic. We found that our exact algorithm was $10$x faster than the Monte Carlo approximation with $20000$ samples on a common dataset.
Semantic Diversity in Dialogue with Natural Language Inference
Authors: Katherine Stasaski, Marti A. Hearst
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2205.01497
Pdf link: https://arxiv.org/pdf/2205.01497
Abstract Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation. We evaluate this metric using an established framework (Tevet and Berant, 2021) and find strong evidence indicating NLI Diversity is correlated with semantic diversity. Specifically, we show that the contradiction relation is more useful than the neutral relation for measuring this diversity and that incorporating the NLI model's confidence achieves state-of-the-art results. Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation, which results in an average 137% increase in NLI Diversity compared to standard generation procedures.
Privacy Amplification via Random Participation in Federated Learning
Authors: Burak Hasircioglu, Deniz Gunduz
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2205.01556
Pdf link: https://arxiv.org/pdf/2205.01556
Abstract Running a randomized algorithm on a subsampled dataset instead of the entire dataset amplifies differential privacy guarantees. In this work, in a federated setting, we consider random participation of the clients in addition to subsampling their local datasets. Since such random participation of the clients creates correlation among the samples of the same client in their subsampling, we analyze the corresponding privacy amplification via non-uniform subsampling. We show that when the size of the local datasets is small, the privacy guarantees via random participation is close to those of the centralized setting, in which the entire dataset is located in a single host and subsampled. On the other hand, when the local datasets are large, observing the output of the algorithm may disclose the identities of the sampled clients with high confidence. Our analysis reveals that, even in this case, privacy guarantees via random participation outperform those via only local subsampling.
Keyword: scaling

Cost-Aware Comparison of LiDAR-based 3D Object Detectors
Authors: Xiaofang Wang, Kris M. Kitani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.01142
Pdf link: https://arxiv.org/pdf/2205.01142
Abstract Considerable research efforts have been devoted to LiDAR-based 3D object detection and its empirical performance has been significantly improved. While the progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e.g., inference latency. This makes it difficult to quantify the true performance gain brought by recently proposed architecture designs. The goal of this work is to conduct a fair comparison of LiDAR-based 3D object detectors. Specifically, we focus on SECOND, a simple grid-based one-stage detector, and analyze its performance under different costs by scaling its original architecture. Then we compare the family of scaled SECOND with recent 3D detection methods, such as Voxel R-CNN and PV-RCNN++. The results are surprising. We find that, if allowed to use the same latency, SECOND can match the performance of PV-RCNN++, the current state-of-the-art method on the Waymo Open Dataset. Scaled SECOND also easily outperforms many recent 3D detection methods published during the past year. We recommend future research control the inference cost in their empirical comparison and include the family of scaled SECOND as a strong baseline when presenting novel 3D detection methods.
Keyword: calibration

Cross-View Cross-Scene Multi-View Crowd Counting
Authors: Qi Zhang, Wei Lin, Antoni B. Chan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.01551
Pdf link: https://arxiv.org/pdf/2205.01551
Abstract Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts. To dynamically handle the challenge of optimal view fusion under scene and camera layout change and non-correspondence noise due to camera calibration errors or erroneous features, we propose a CVCS model that attentively selects and fuses multiple views together using camera layout geometry, and a noise view regularization method to train the model to handle non-correspondence errors. We also generate a large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. We then test our trained CVCS model on real multi-view counting datasets, by using unsupervised domain transfer. The proposed CVCS model trained on synthetic data outperforms the same model trained only on real data, and achieves promising performance compared to fully supervised methods that train and test on the same single scene.

ericbeyer / L-arxiv-interest-tracker

New submissions for Wed, 4 May 22 #497

Keyword: out of distribution detection

Keyword: out-of-distribution detection

Keyword: expected calibration error

Keyword: overconfident

Keyword: overconfidence

Keyword: confidence

Exact Paired-Permutation Testing for Structured Test Statistics

Semantic Diversity in Dialogue with Natural Language Inference

Privacy Amplification via Random Participation in Federated Learning

Keyword: scaling

Cost-Aware Comparison of LiDAR-based 3D Object Detectors

Keyword: calibration

Cross-View Cross-Scene Multi-View Crowd Counting