Abstract
Visual dialog (VisDial) is a task of answering a sequence of questions grounded in an image, using the dialog history as context. Prior work has trained the dialog agents solely on VisDial data via supervised learning or leveraged pre-training on related vision-and-language datasets. This paper presents a semi-supervised learning approach for visually-grounded dialog, called Generative Self-Training (GST), to leverage unlabeled images on the Web. Specifically, GST first retrieves in-domain images through out-of-distribution detection and generates synthetic dialogs regarding the images via multimodal conditional text generation. GST then trains a dialog agent on the synthetic and the original VisDial data. As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1.2M to 12.9M QA data). For robust training of the generated dialogs, we also propose perplexity-based data selection and multimodal consistency regularization. Evaluation on VisDial v1.0 and v0.9 datasets shows that GST achieves new state-of-the-art results on both datasets. We further observe strong performance gains in the low-data regime (up to 9.35 absolute points on NDCG).
Keyword: expected calibration error
Revisiting Calibration for Question Answering
Authors: Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Abstract
Model calibration aims to adjust (calibrate) models' confidence so that they match expected accuracy. We argue that the traditional evaluation of calibration (expected calibration error; ECE) does not reflect usefulness of the model confidence. For example, after conventional temperature scaling, confidence scores become similar for all predictions, which makes it hard for users to distinguish correct predictions from wrong ones, even though it achieves low ECE. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. We examine various conventional calibration methods including temperature scaling, feature-based classifier, neural answer reranking, and label smoothing, all of which do not bring significant gains under our new MacroCE metric. Towards more effective calibration, we propose a new calibration method based on the model's prediction consistency along the training trajectory. This new method, which we name as consistency calibration, shows promise for better calibration.
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Revisiting Calibration for Question Answering
Authors: Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Abstract
Model calibration aims to adjust (calibrate) models' confidence so that they match expected accuracy. We argue that the traditional evaluation of calibration (expected calibration error; ECE) does not reflect usefulness of the model confidence. For example, after conventional temperature scaling, confidence scores become similar for all predictions, which makes it hard for users to distinguish correct predictions from wrong ones, even though it achieves low ECE. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. We examine various conventional calibration methods including temperature scaling, feature-based classifier, neural answer reranking, and label smoothing, all of which do not bring significant gains under our new MacroCE metric. Towards more effective calibration, we propose a new calibration method based on the model's prediction consistency along the training trajectory. This new method, which we name as consistency calibration, shows promise for better calibration.
LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification
Abstract
Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels. The quality of pseudo-labels is crucial to final performance but they are inevitably noisy due to their heuristic nature, so selecting the correct ones has a huge potential for performance boost. One straightforward solution is to select samples based on the softmax probability scores in the neural classifier corresponding to their pseudo-labels. However, we show through our experiments that such solutions are ineffective and unstable due to the erroneously high-confidence predictions from poorly calibrated models. Recent studies on the memorization effects of deep neural models suggest that these models first memorize training samples with clean labels and then those with noisy labels. Inspired by this observation, we propose a novel pseudo-label selection method LOPS that takes learning order of samples into consideration. We hypothesize that the learning order reflects the probability of wrong annotation in terms of ranking, and therefore, propose to select the samples that are learnt earlier. LOPS can be viewed as a strong performance-boost plug-in to most of existing weakly-supervised text classification methods, as confirmed in extensive experiments on four real-world datasets.
Deep Aesthetic Assessment and Retrieval of Breast Cancer Treatment Outcomes
Authors: Wilson Silva, Maria Carvalho, Carlos Mavioso, Maria J. Cardoso, Jaime S. Cardoso
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Treatments for breast cancer have continued to evolve and improve in recent years, resulting in a substantial increase in survival rates, with approximately 80\% of patients having a 10-year survival period. Given the serious impact that breast cancer treatments can have on a patient's body image, consequently affecting her self-confidence and sexual and intimate relationships, it is paramount to ensure that women receive the treatment that optimizes both survival and aesthetic outcomes. Currently, there is no gold standard for evaluating the aesthetic outcome of breast cancer treatment. In addition, there is no standard way to show patients the potential outcome of surgery. The presentation of similar cases from the past would be extremely important to manage women's expectations of the possible outcome. In this work, we propose a deep neural network to perform the aesthetic evaluation. As a proof-of-concept, we focus on a binary aesthetic evaluation. Besides its use for classification, this deep neural network can also be used to find the most similar past cases by searching for nearest neighbours in the highly semantic space before classification. We performed the experiments on a dataset consisting of 143 photos of women after conservative treatment for breast cancer. The results for accuracy and balanced accuracy showed the superior performance of our proposed model compared to the state of the art in aesthetic evaluation of breast cancer treatments. In addition, the model showed a good ability to retrieve similar previous cases, with the retrieved cases having the same or adjacent class (in the 4-class setting) and having similar types of asymmetry. Finally, a qualitative interpretability assessment was also performed to analyse the robustness and trustworthiness of the model.
Keyword: scaling
El-WaveHoltz: A Time-Domain Iterative Solver for Time-Harmonic Elastic Waves
Authors: Daniel Appelö, Fortino Garcia, Allen Alvarez Loya, Olof Runborg
Abstract
We consider the application of the WaveHoltz iteration to time-harmonic elastic wave equations with energy conserving boundary conditions. The original WaveHoltz iteration for acoustic Helmholtz problems is a fixed-point iteration that filters the solution of the wave equation with time-harmonic forcing and boundary data. As in the original WaveHoltz method, we reformulate the fixed point iteration as a positive definite linear system of equations that is iteratively solved by a Krylov method. We present two time-stepping schemes, one explicit and one (novel) implicit, which completely remove time discretization error from the WaveHoltz solution by performing a simple modification of the initial data and time-stepping scheme. Numerical experiments indicate an iteration scaling similar to that of the original WaveHoltz method, and that the convergence rate is dictated by the shortest (shear) wave speed of the problem. We additionally show that the implicit scheme can be advantageous in practice for meshes with disparate element sizes.
FLUTE: Figurative Language Understanding and Textual Explanations
Abstract
In spite of the prevalence of figurative language, transformer-based models struggle to demonstrate an understanding of it. Meanwhile, even classical natural language inference (NLI) tasks have been plagued by spurious correlations and annotation artifacts. Datasets like eSNLI have been released, allowing to probe whether language models are right for the right reasons. Yet no such data exists for figurative language, making it harder to asses genuine understanding of such expressions. In light of the above, we release FLUTE, a dataset of 8,000 figurative NLI instances with explanations, spanning three categories: Sarcasm, Simile, and Metaphor. We collect the data through the Human-AI collaboration framework based on GPT-3, crowdworkers, and expert annotation. We show how utilizing GPT-3 in conjunction with human experts can aid in scaling up the creation of datasets even for such complex linguistic phenomena as figurative language. Baseline performance of the T5 model shows our dataset is a challenging testbed for figurative language understanding.
A Multi-domain Magneto Tunnel Junction for Racetrack Nanowire Strips
Authors: Prayash Dutta (1), Albert Lee (2), Kang L. Wang (2), Alex K. Jones (3), Sanjukta Bhanja (1) ((1) University of South Florida, (2) UCLA, (3) University of Pittsburgh)
Abstract
Domain-wall memory (DWM) has SRAM class access performance, low energy, high endurance, high density, and CMOS compatibility. Recently, shift reliability and processing-using-memory (PuM) proposals developed a need to count the number of parallel or anti-parallel domains in a portion of the DWM nanowire. In this paper we propose a multi-domain magneto-tunnel junction (MTJ) that can detect different resistance levels as a function of a the number of parallel or anti-parallel domains. Using detailed micromagnetic simulation with LLG, we demonstrate the multi-domain MTJ, study the benefit of its macro-size on resilience to process variation and present a macro-model for scaling the size of the multi-domain MTJ. Our results indicate scalability to seven-domains while maintaining a 16.3mV sense margin.
Revisiting Calibration for Question Answering
Authors: Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Abstract
Model calibration aims to adjust (calibrate) models' confidence so that they match expected accuracy. We argue that the traditional evaluation of calibration (expected calibration error; ECE) does not reflect usefulness of the model confidence. For example, after conventional temperature scaling, confidence scores become similar for all predictions, which makes it hard for users to distinguish correct predictions from wrong ones, even though it achieves low ECE. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. We examine various conventional calibration methods including temperature scaling, feature-based classifier, neural answer reranking, and label smoothing, all of which do not bring significant gains under our new MacroCE metric. Towards more effective calibration, we propose a new calibration method based on the model's prediction consistency along the training trajectory. This new method, which we name as consistency calibration, shows promise for better calibration.
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
Authors: Kevin Heffernan, Onur Çelebi, Holger Schwenk
Abstract
Scaling multilingual representation learning beyond the hundred most frequent languages is challenging, in particular to cover the long tail of low-resource languages. A promising approach has been to train one-for-all multilingual models capable of cross-lingual transfer, but these models often suffer from insufficient capacity and interference between unrelated languages. Instead, we move away from this approach and focus on training multiple language (family) specific representations, but most prominently enable all languages to still be encoded in the same representational space. To achieve this, we focus on teacher-student training, allowing all encoders to be mutually compatible for bitext mining, and enabling fast learning of new languages. We introduce a new teacher-student training scheme which combines supervised and self-supervised training, allowing encoders to take advantage of monolingual training data, which is valuable in the low-resource setting. Our approach significantly outperforms the original LASER encoder. We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model. For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems
Authors: Andrea Gesmundo, Jeff Dean
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Abstract
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is often focused to the study of common pitfalls such as catastrophic forgetting instead of being studied at a large scale as a critical component to build the next generation artificial intelligence. We propose an evolutionary method that can generate a large scale multitask model, and can support the dynamic and continuous addition of new tasks. The generated multitask model is sparsely activated and integrates a task-based routing that guarantees bounded compute cost and fewer added parameters per task as the model expands. The proposed method relies on a knowledge compartmentalization technique to achieve immunity against catastrophic forgetting and other common pitfalls such as gradient interference and negative transfer. We empirically show that the proposed method can jointly solve and achieve competitive results on 69image classification tasks, for example achieving the best test accuracy reported fora model trained only on public data for competitive tasks such as cifar10: 99.43%.
Highly efficient energy-conserving moment method for the multi-dimensional Vlasov-Maxwell system
Abstract
We present an energy-conserving numerical scheme to solve the Vlasov-Maxwell (VM) system based on the regularized moment method proposed in [Z. Cai, Y. Fan, and R. Li. CPAM, 2014]. The globally hyperbolic moment system is deduced for the multi-dimensional VM system under the framework of the Hermite expansions, where the expansion center and the scaling factor are set as the macroscopic velocity and local temperature, respectively. Thus, the effect of the Lorentz force term could be reduced into several ODEs about the macroscopic velocity and the moment coefficients of higher order, which could significantly reduce the computational cost of the whole system. An energy-conserving numerical scheme is proposed to solve the moment equations and the Maxwell equations, where only a linear equation system needs to be solved. Several numerical examples such as the two-stream instability, Weibel instability, and the two-dimensional Orszag Tang vortex problem are studied to validate the efficiency and excellent energy-preserving property of the numerical scheme.
Keyword: calibration
Jointly Optimizing Color Rendition and In-Camera Backgrounds in an RGB Virtual Production Stage
Authors: Chloe LeGendre, Lukas Lepicovsky, Paul Debevec
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
While the LED panels used in virtual production systems can display vibrant imagery with a wide color gamut, they produce problematic color shifts when used as lighting due to their peaky spectral output from narrow-band red, green, and blue LEDs. In this work, we present an improved color calibration process for virtual production stages which ameliorates this color rendition problem while also passing through accurate in-camera background colors. We do this by optimizing linear color correction transformations for 1) the LED panel pixels visible in the field of view of the camera, 2) the pixels outside the field of view of the camera illuminating the subjects, and, as a post-process, 3) the pixel values recorded by the camera. The result is that footage shot in an RGB LED panel virtual production stage can exhibit more accurate skin tones and costume colors while still reproducing the desired colors of the in-camera background.
Revisiting Calibration for Question Answering
Authors: Chenglei Si, Chen Zhao, Sewon Min, Jordan Boyd-Graber
Abstract
Model calibration aims to adjust (calibrate) models' confidence so that they match expected accuracy. We argue that the traditional evaluation of calibration (expected calibration error; ECE) does not reflect usefulness of the model confidence. For example, after conventional temperature scaling, confidence scores become similar for all predictions, which makes it hard for users to distinguish correct predictions from wrong ones, even though it achieves low ECE. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. We examine various conventional calibration methods including temperature scaling, feature-based classifier, neural answer reranking, and label smoothing, all of which do not bring significant gains under our new MacroCE metric. Towards more effective calibration, we propose a new calibration method based on the model's prediction consistency along the training trajectory. This new method, which we name as consistency calibration, shows promise for better calibration.
Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection
Abstract
In this paper, we present a robotic device for mouse tail vein injection. We propose a mouse holding mechanism to realize vein injection without anesthetizing the mouse, which consists of a tourniquet, vacuum port, and adaptive tail-end fixture. The position of the target vein in 3D space is reconstructed from a high-resolution stereo vision. The vein is detected by a simple but robust vein line detector. Thanks to the proposed two-staged calibration process, the total time for the injection process is limited to 1.5 minutes, despite that the position of needle and tail vein varies for each trial. We performed an injection experiment targeting 40 mice and succeeded to inject saline to 37 of them, resulting 92.5% success ratio.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Keyword: expected calibration error
Revisiting Calibration for Question Answering
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Revisiting Calibration for Question Answering
LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification
Deep Aesthetic Assessment and Retrieval of Breast Cancer Treatment Outcomes
Keyword: scaling
El-WaveHoltz: A Time-Domain Iterative Solver for Time-Harmonic Elastic Waves
FLUTE: Figurative Language Understanding and Textual Explanations
A Multi-domain Magneto Tunnel Junction for Racetrack Nanowire Strips
Revisiting Calibration for Question Answering
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems
Highly efficient energy-conserving moment method for the multi-dimensional Vlasov-Maxwell system
Keyword: calibration
Jointly Optimizing Color Rendition and In-Camera Backgrounds in an RGB Virtual Production Stage
Revisiting Calibration for Question Answering
Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection