Abstract
We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques, which makes mid-sized models less cost-effective, (b) GPT-3 was one order of magnitude larger than previous language models, and researchers afterwards primarily experimented with bigger models to outperform it. While these dynamics likely exist, and we believe they play some role in generating the gap, we don't have high confidence that there are no other, more important dynamics at play.
Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling
Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract
This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not. We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.
A Methodology to Support Automatic Cyber Risk Assessment Review
Authors: Marco Angelini, Silvia Bonomi, Alessandro Palma
Abstract
Cyber risk assessment is a fundamental activity for enhancing the protection of an organization, identifying and evaluating the exposure to cyber threats. Currently, this activity is carried out mainly manually and the identification and correct quantification of risks deeply depend on the experience and confidence of the human assessor. As a consequence, the process is not completely objective and two parallel assessments of the same situation may lead to different results. This paper takes a step in the direction of reducing the degree of subjectivity by proposing a methodology to support risk assessors with an automatic review of the produced assessment. Our methodology starts from a controls-based assessment performed using well-known cybersecurity frameworks (e.g., ISO 27001, NIST) and maps security controls over infrastructural aspects that can be assessed automatically (e.g., ICT devices, organization policies). Exploiting this mapping, the methodology suggests how to identify controls needing revision. The approach has been validated through a case study from the healthcare domain and a set of statistical analyses.
Calibrate to Interpret
Authors: Gregory Scafarto, Nicolas Posocco, Antoine Bonnefoy
Abstract
Trustworthy machine learning is driving a large number of ML community works in order to improve ML acceptance and adoption. The main aspect of trustworthy machine learning are the followings: fairness, uncertainty, robustness, explainability and formal guaranties. Each of these individual domains gains the ML community interest, visible by the number of related publications. However few works tackle the interconnection between these fields. In this paper we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. As the calibration of a given model changes the way it scores samples, and interpretation approaches often rely on these scores, it seems safe to assume that the confidence-calibration of a model interacts with our ability to interpret such model. In this paper, we show, in the context of networks trained on image classification tasks, to what extent interpretations are sensitive to confidence-calibration. It leads us to suggest a simple practice to improve the interpretation outcomes: Calibrate to Interpret.
Keyword: scaling
The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation
Authors: Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, Walid G. Aref
Abstract
Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of "shared what" in the database community. We envision that distributed shared-memory databases (DSM-DB, for short) - that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB.
Keyword: calibration
Calibrate to Interpret
Authors: Gregory Scafarto, Nicolas Posocco, Antoine Bonnefoy
Abstract
Trustworthy machine learning is driving a large number of ML community works in order to improve ML acceptance and adoption. The main aspect of trustworthy machine learning are the followings: fairness, uncertainty, robustness, explainability and formal guaranties. Each of these individual domains gains the ML community interest, visible by the number of related publications. However few works tackle the interconnection between these fields. In this paper we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. As the calibration of a given model changes the way it scores samples, and interpretation approaches often rely on these scores, it seems safe to assume that the confidence-calibration of a model interacts with our ability to interpret such model. In this paper, we show, in the context of networks trained on image classification tasks, to what extent interpretations are sensitive to confidence-calibration. It leads us to suggest a simple practice to improve the interpretation outcomes: Calibrate to Interpret.
Keyword: out of distribution detection
There is no result
Keyword: out-of-distribution detection
There is no result
Keyword: expected calibration error
There is no result
Keyword: overconfident
There is no result
Keyword: overconfidence
There is no result
Keyword: confidence
Machine Learning Model Sizes and the Parameter Gap
Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling
A Methodology to Support Automatic Cyber Risk Assessment Review
Calibrate to Interpret
Keyword: scaling
The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation
Keyword: calibration
Calibrate to Interpret