Abstract
It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model's relevancy signal and manipulate it such that the model is focused on the foreground object. This is done as a finetuning step, involving relatively few samples consisting of pairs of images and their associated foreground masks. Specifically, we encourage the model's relevancy map (i) to assign lower relevance to background regions, (ii) to consider as much information as possible from the foreground, and (iii) we encourage the decisions to have high confidence. When applied to Vision Transformer (ViT) models, a marked improvement in robustness to domain shifts is observed. Moreover, the foreground masks can be obtained automatically, from a self-supervised variant of the ViT model itself; therefore no additional supervision is required.
Keyword: scaling
Cascaded Video Generation for Videos In-the-Wild
Authors: Lluis Castrejon, Nicolas Ballas, Aaron Courville
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Videos can be created by first outlining a global view of the scene and then adding local details. Inspired by this idea we propose a cascaded model for video generation which follows a coarse to fine approach. First our model generates a low resolution video, establishing the global scene structure, which is then refined by subsequent cascade levels operating at larger resolutions. We train each cascade level sequentially on partial views of the videos, which reduces the computational complexity of our model and makes it scalable to high-resolution videos with many frames. We empirically validate our approach on UCF101 and Kinetics-600, for which our model is competitive with the state-of-the-art. We further demonstrate the scaling capabilities of our model and train a three-level model on the BDD100K dataset which generates 256x256 pixels videos with 48 frames.
Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness
Authors: Christoph Auer (1), Michele Dolfi (1), André Carvalho (2), Cesar Berrospi Ramis (1), Peter W. J. Staar (1) ((1) IBM Research, (2) SoftINSA Lda.)
Subjects: Digital Libraries (cs.DL); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Document understanding is a key business process in the data-driven economy since documents are central to knowledge discovery and business insights. Converting documents into a machine-processable format is a particular challenge here due to their huge variability in formats and complex structure. Accordingly, many algorithms and machine-learning methods emerged to solve particular tasks such as Optical Character Recognition (OCR), layout analysis, table-structure recovery, figure understanding, etc. We observe the adoption of such methods in document understanding solutions offered by all major cloud providers. Yet, publications outlining how such services are designed and optimized to scale in the cloud are scarce. In this paper, we focus on the case of document conversion to illustrate the particular challenges of scaling a complex data processing pipeline with a strong reliance on machine-learning methods on cloud infrastructure. Our key objective is to achieve high scalability and responsiveness for different workload profiles in a well-defined resource budget. We outline the requirements, design, and implementation choices of our document conversion service and reflect on the challenges we faced. Evidence for the scaling behavior and resource efficiency is provided for two alternative workload distribution strategies and deployment configurations. Our best-performing method achieves sustained throughput of over one million PDF pages per hour on 3072 CPU cores across 192 nodes.
A Serverless Engine for High Energy Physics Distributed Analysis
Authors: Jacek Kuśnierz, Vincenzo Eduardo Padulano, Maciej Malawski, Kamil Burkiewicz, Enric Tejedor Saavedra, Pedro Alonso-Jordá, Michael Pitt, Valentina Avati
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.
Weakly Supervised Representation Learning with Sparse Perturbations
Authors: Kartik Ahuja, Jason Hartford, Yoshua Bengio
Abstract
The theory of representation learning aims to build methods that provably invert the data generating process with minimal domain knowledge or any source of supervision. Most prior approaches require strong distributional assumptions on the latent variables and weak supervision (auxiliary information such as timestamps) to provide provable identification guarantees. In this work, we show that if one has weak supervision from observations generated by sparse perturbations of the latent variables--e.g. images in a reinforcement learning environment where actions move individual sprites--identification is achievable under unknown continuous latent distributions. We show that if the perturbations are applied only on mutually exclusive blocks of latents, we identify the latents up to those blocks. We also show that if these perturbation blocks overlap, we identify latents up to the smallest blocks shared across perturbations. Consequently, if there are blocks that intersect in one latent variable only, then such latents are identified up to permutation and scaling. We propose a natural estimation procedure based on this theory and illustrate it on low-dimensional synthetic and image-based experiments.
Keyword: calibration
A Log-Linear Time Sequential Optimal Calibration Algorithm for Quantized Isotonic L2 Regression
Authors: Kaan Gokcesu, Hakan Gokcesu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
We study the sequential calibration of estimations in a quantized isotonic L2 regression setting. We start by showing that the optimal calibrated quantized estimations can be acquired from the traditional isotonic L2 regression solution. We modify the traditional PAVA algorithm to create calibrators for both batch and sequential optimization of the quantized isotonic regression problem. Our algorithm can update the optimal quantized monotone mapping for the samples observed so far in linear space and logarithmic time per new unordered sample.
Feature Space Particle Inference for Neural Network Ensembles
Authors: Shingo Yashima, Teppei Suzuki, Kohta Ishikawa, Ikuro Sato, Rei Kawakami
Abstract
Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the over-parameterization issues, while seeking samples directly from the function-space posterior often results in serious underfitting. In this study, we propose optimizing particles in the feature space where the activation of a specific intermediate layer lies to address the above-mentioned difficulties. Our method encourages each member to capture distinct features, which is expected to improve ensemble prediction robustness. Extensive evaluation on real-world datasets shows that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness. Code is available at https://github.com/DensoITLab/featurePI .
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Optimizing Relevance Maps of Vision Transformers Improves Robustness
Keyword: scaling
Cascaded Video Generation for Videos In-the-Wild
Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness
A Serverless Engine for High Energy Physics Distributed Analysis
Weakly Supervised Representation Learning with Sparse Perturbations
Keyword: calibration
A Log-Linear Time Sequential Optimal Calibration Algorithm for Quantized Isotonic L2 Regression
Feature Space Particle Inference for Neural Network Ensembles