UNECE / GSIMRevision

1 stars 1 forks source link

Registers in statistical production: their roles and types, and how to model them in GSIM #17

Closed FlavioRizzolo closed 1 year ago

FlavioRizzolo commented 2 years ago

We need to define what types of registers are in scope in GSIM. Registers were modeled in the past as types of Exchange Channels, but that has changed in the recent revision (see Missing GSIM class – specification of Exchange Channel).

It can be argued that all types of registers share the same characteristics and therefore there is no need to distinguish between them in GSIM. However, some definitions characterize them differently, and thus might justify to capture the distinction.

Some sources to consider:

A Statistical register is a Register created for statistical purposes normally by statisticians. They are typically created by transforming data from Registers and/or other Administrative data sources.

A register is a written and complete record containing regular entries of items and details on particular set of objects.

A statistical register is a register that is constructed and maintained for statistical purposes, according to statistical concepts and definitions, and under the control of statisticians. Administrative registers can therefore be used as sources for statistical registers, but the reverse would normally be seen as contradicting the principle of the “one-way flow” of data

An administrative register is maintained to store records on all objects to be administered, and the administrative process requires that all objects can be identified. The following definition is valid for administrative and statistical registers: A register aims to be a complete list of the objects in a specific group of objects or population. However, data on some objects can be missing due to quality deficiencies. Data on an object’s identity should be available so that the register can be updated and expanded with new variable values for each object.

Registers also have their own lifecycle management processes, which might require some tweaking of the Business Group in GSIM to be able to better capture them.

dgillman4909 commented 2 years ago

I think we need to be careful to distinguish between uses of a register and the essential characteristics defining what one is. The underlying issue is whether the registers statistical agencies maintain for statistical purposes have one or more characteristics that are different from all other kinds of registers. The uses of a register - their usage within a statistical program, how they are maintained, etc. - are not characteristics. They are mostly program specific, so they are properties of each register. Here, I distinguish characteristics - features of a class - and properties - features of an object, which is an instance of a class. I believe the essential characteristics of registers are the same, independent of purpose. I will change my mind if somebody can identify a characteristic of statistical registers that makes them different. Is there a characteristic all statistical registers share that other registers do not?

zoltanvereczkei commented 2 years ago

I think before we start discussing the potentially involved GSIM objects, the first question is to clarify what registers we are talking about and what the statistical processes/functions are around them. From statistical point of view, I think there are two at least:

  1. Registers that we build, maintain in the statistical organisation for statistical purposes. These are the ones maybe we can call "statistical registers".

  2. Registers that are maintained by somebody else outside of the statistical organisation (usually administrative data owners). Statistical organisations have nothing to do whatsoever with their maintenance but they can use data from this data source for the development, production and dissemination as statistics. These registers may be called "administrative registers".

Statistical role/functions of statistical registers:

In this sense, I think that statistical registers play a few important roles in statistics. Maybe the most important ones are:

_Bonus question: administrative registers_

I think that they are basically a type of data sources for official statistics. From quality perspective I think that those outside data sources that are maintained and managed as registers have a more important role than more simpler lists: they manage historical data, easier to check the validity of information, better metadata, etc. So from statistical point of view it might be important to distinguish administrative registers among the data sources used for the development, production and dissemination of official statistics.

Anyway, a proper register definition is also needed. I prefer the one we used in the MEMOBUST project. Input from its glossary:

Statistical register is a continuously updated set of objects for a given population containing information on identification, accessibility of population units and other attributes, supporting the surveying process of the population. The register contains the current and historical statuses of the population and the causes, effects and sources of alterations in the population. Register data of population units are stored in a structured database. Link: https://ec.europa.eu/eurostat/cros/content/glossary_en

InKyungChoi commented 2 years ago

Meeting notes from August

JALinnerud commented 2 years ago

So many good definitions! It might be time to remind ourselves of 2 GSIM Design Principles: Principle 8: Common language Statement: GSIM provides a basis for a common understanding of information objects Rationale: • GSIM provides a common language to describe information objects that support statistical production. One purpose of GSIM is to improve communication between different disciplines involved in statistical production, within and across statistical organizations; and between users, producers and providers of official statistics. • A common language leads to common understanding. • GSIM provides a reference framework that existing terms can be mapped to, encouraging greater interoperability between systems. • Improving communication will result in a more efficient exChange of data and metadata within and between statistical organizations, and also with external clients and suppliers. Implications: • Terms used within GSIM must be clearly defined and should be as intuitive as possible. • Where the same term is used with a different meaning in another common standard then the way the two definitions relate to each other should be clarified in GSIM. • Where an equivalent concept exists in another common standard, but is given a different name, the name should be identified as a synonym in GSIM. • Consideration should be given to how the key terms selected will translate to common languages other than English. Principle 13: Optimal reuse Statement: GSIM makes optimal reuse of existing terms and definitions. Rationale: One purpose of GSIM is to generate economies of scale by enabling intra- and inter-organization collaboration, especially through reuse of information, methods or technology. Using existing terms and definitions to describe information is intended to ensure the model aligns with existing practices and is more likely to be adopted. Implications: • Reusing previous work relating to information and metadata management will ensure GSIM is developed as efficiently as possible. • Reaching common agreement on standard statistical terms is difficult when a high degree of variance exists and GSIM may be hampered by historical lack of agreement in some areas. • The opportunity to Change terms that are counter-intuitive but widely accepted should not be lost.

zoltanvereczkei commented 2 years ago

I will organise a separate meeting for this issue.

GiorgiaSimeoni commented 2 years ago

I basically agree with Zoltan comments. I just would like to stress the Output role of statistical registers. I share our experience at Istat to let you understand what I mean. With modernisation, we are actually building a system of statistical registers that will be the base for the statistical production. Each statistical register is the output of a (complex) process that we are trying to represent with GSIM. It happens that surveys results has the role of auxiliary information to assess the quality of statistical registers.

InKyungChoi commented 2 years ago

Definitions from Statistical Metadata Glossary

Register

Administrative register (almost similar to GSIM)

Statistical register (almost similar to GSIM)

zoltanvereczkei commented 1 year ago

CONCLUSIONS

Definition

We propose to use GSIM object "Register" and do not distinguish "Statistical Register" and "Administrative [or other] Register". Reasoning: even though these are different things from the statistical perspective, both types would have same or very similar characteristics (attributes) and it is a GSIM principle to define only one object for those elements that can be described with the same attributes. In light of this, there is a recommendation to remove GSIM objects "Administrative Register" and "Statistical Register" and use only the object "Register".

Current GSIM content:

Object: Administrative Register Group: Exchange Definition: A source of administrative information which is obtained from an external organisation (or sometimes from another department of the same organisation). Explanatory text: The Administrative Register is a source of administrative information obtained usually from external organisations. The Administrative Register would be provided under a Provision Agreement with the Information Provider. This administrative information is usually collected for an organisation's operational purposes, rather than for statistical purposes.

Object: Statistical Register Group: Exchange Definition: A Statistical Register is a register that is a regularly updated list of Units and their properties that is designed for statistical purposes. Explanatory text: A Statistical Register provides an (ideally) complete inventory of the Units within a specific Population, and describes these Units using different characteristics. One example is the statistical business register held within a statistical organization. All the Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information on the Units.

Proposed new GSIM content:

Object: Register Group: Exchange Definition: Register is a continuously updated set of objects for a given Population containing information on identification, accessibility of Units and other attributes, supporting the surveying process of the Population. The Register contains the current and historical statuses of the Population and the causes, effects and sources of alterations in the Population. Explanatory text: In official statistics, "statistical registers" and "adminisitrative registers" (registers maintained by other organisations, usually administrative data owners) are usually distinguished. In GSIM, the object "Register" is used to describe both types because the attributes are more or less the same so from information management point of view, they can be handled as one object. In order to understand how the Register is used in GSIM, the use cases for the different scenarios are explained. These scenarios are:

  1. Register as Information Set maintained and regularly updated by the statistical organisation;
  2. Register as Information Set for survey frames/sample frames;
  3. Register as Information Set for statistical products;
  4. Register as Information Set used as direct or auxiliary information for the production of statistics.

Use of Register in different statistical scenarios:

Register as Information Set maintained and regularly updated by the statistical organisation The Register plays an output role in this scenario. Statistical organisations build, maintain these for statistical purposes. Many statistical organisations not only build and maintain different Registers but build a system of Registers that will be the base for the statistical production. Registers are built and maintained from different data sources (data collections, outside data sources, other), some of these sources might be Registers maintained by other organisations, usually operating outside the domain of official statistics.

GSIM objects in use: Provision Agreement, Information Provider, Exchange Tool/Instrument, Information Set, Register

Register as Information Set for survey frames/sample frames The Register plays an output role in this scenario. Technically, a frame (Data Set) is created from a frozen current state of the Register (also considered as a Data set) as the Register records the Units and a wide range of their attributes used for the creation of survey frames. The frame is then used to identify the different Units of the Population.

GSIM objects in use: Register, Data Set, Information Set

Register as Information Set for statistical products The Register plays an input role in this scenario. The statistical organisation produces a Data Set (tabular and/or microdata) from the frozen current state of the Register. The responsible units will then produce (parts of or the full) statistical products from this information.

GSIM objects in use: Register, Data Set, Information Set, Product

Register as Information Set used as direct or auxiliary information for the production of statistics The Register plays an input role in this scenario. The organisation produces a Data Set (tabular and/or microdata) from the frozen current state of the Register. Statisticians require information from the Register to use it as direct input or auxiliary information for certain statistical tasks, such as weighting, imputation, macrovalidation etc.

GSIM objects in use: Register, Data Set, Information Set

JALinnerud commented 1 year ago

@zoltanvereczkei , I agree with your conclusion.

FlavioRizzolo commented 1 year ago

In a separate email discussion, @InKyungChoi proposed to use the MEMOBUST definition of "Register" ("A written and complete record containing regular entries of items and details on particular set of objects"), which is also the one used in Using Administrative Data in Statistical Registers.

I think it's a good one, but even if we want a different one I have a couple of concerns about the proposed definition: the use of “Population” makes it sound more statistical in nature, which is not the case in general. Could we use “complete set of objects”? The clause "Supporting the surveying process of a population" is also statistical oriented, but more importantly it seems to be about what it's used for rather than what it is. I like the addition of “continuously updated” (evergreening), but I think it's rather part of the lifecycle management rather than the definition. Summing up, I'm wondering whether a simple definition like that one above would be better, and then we can have statements about the type of statistical activities it supports together with the continuous update in the explanatory text. Just a thought.

dgillman4909 commented 1 year ago

First, I like this direction. Defining "register" rather than kinds of registers - statistical, etc. - is the way to go. I also like Flavio's recommendation - saying what a thing is rather than what it does or what is done to it. That said, I propose a slightly altered definition: (noun) Enumeration of a set of objects, maintained over time

By "enumeration" I mean that the maintained list is as complete as possible at any one point in time. By "maintained" I mean that it is updated, modified, or made current on a continual (note, meaning done over time, not continuous, which means never ceasing) basis. It sounds as though maintenance is about something done to a register, but the fact that it is maintained over time distinguishes it from a one-time census, for example, conducted by a statistical office.

In the Explanatory Text, we can explain that there's usually a purpose or authority for maintaining the list, and each object in the list is described using a pre-defined set of characteristics. Examples include business and population registers as used by statistical offices.

GiorgiaSimeoni commented 1 year ago

Hi all, hope to be still on time to add my personal opinion. Concerning @zoltanvereczkei proposal for Register definition, I think that the concept of Register should be explicitly related with the concept of Population, so I support the sentence "Register is a continuously updated set of objects for a given Population". "Continuosly updated" can be substituted by "updated on a continual basis" or "maintained over time" or "regularly updated" if it sounds better. I am not convinced about the part on "containing information on identification, accessibility of Units and other attributes", because they could be more specific characteristics of a frame than a register, so maybe we can be a bit more generic, like in Flavio proposal. also the sentence "supporting the surveying process of the Population" can be omitted because it is more connected with the use that is done of the register.
I agree with the rest of the proposal by Zoltan, use case scenarios included. Now I have a doubt: what about administrative data that are collected for statistical purposes but are not Registers?

zoltanvereczkei commented 1 year ago

Thanks for all the comments! Based on this feedback, I finalised the definition and the explanatory text in a way that the simple definition used by Using Administrative and Secondary Sources for Official Statistics handbook can be proposed and all other comments can go into the explanatory part. Maybe this is the best solution, also from the perspective that we adopt an existing definition and not generating an n+1 version for the same thing.

Object: Register Group: Exchange

Definition: a written and complete record containing regular entries of items and details on particular set of objects. Reference: Using Administrative and Secondary Sources for Official Statistics handbook (https://statswiki.unece.org/display/adso/Using+Administrative+Data+in+Statistical+Registers)

Explanatory text: In official statistics, "statistical registers" and "adminisitrative registers" (registers maintained by other organisations, usually administrative data owners) are usually distinguished. In GSIM, the object "Register" is used to describe both types because the attributes are more or less the same so from information management point of view, they can be handled as one object.

There is usually a purpose or authority for maintaining the Register and each object in the Register is described using a pre-defined set of characteristics. Examples include business and population registers as used by statistical offices. Therefore, from statistical perspective, the Register can be interpreted as a set of objects for a given Population, updated on a regular basis, containing information on identification, accessibility of Units and other attributes. The Register contains the current and historical statuses of the Population and the causes, effects and sources of alterations in the Population.

In order to better understand how the Register is used in GSIM, the use cases for the different scenarios are explained. These scenarios are:

Register as Information Set maintained and regularly updated by the statistical organisation;
Register as Information Set for survey frames/sample frames;
Register as Information Set for statistical products;
Register as Information Set used as direct or auxiliary information for the production of statistics.

Thanks a lot!

JALinnerud commented 1 year ago

Please add a link to the memobust definition. I tried to find it, but only found "A written and complete record containing regular entries of items and details on particular set of objects. Administrative registers come from administrative sources and become statistical registers after passing through statistical processing in order to make them fit for statistical purposes (production of register based statistics, frame creation, etc.)." https://ec.europa.eu/eurostat/cros/system/files/Memobust%20glossary%20def.pdf The first sentence in the Memobust definition correponds with the proposed definition, but the second sentence in the Memobust Definition is rephrased and part of our explanatory text. I would conclude that our proposed definition is based on, but not identical to the Memobust definition.

JALinnerud commented 1 year ago

Should we add a use case for an administrative register?

zoltanvereczkei commented 1 year ago

Please add a link to the memobust definition. I tried to find it, but only found "A written and complete record containing regular entries of items and details on particular set of objects. Administrative registers come from administrative sources and become statistical registers after passing through statistical processing in order to make them fit for statistical purposes (production of register based statistics, frame creation, etc.)." https://ec.europa.eu/eurostat/cros/system/files/Memobust%20glossary%20def.pdf The first sentence in the Memobust definition correponds with the proposed definition, but the second sentence in the Memobust Definition is rephrased and part of our explanatory text. I would conclude that our proposed definition is based on, but not identical to the Memobust definition.

Hi Jenny! You are absolutely right, I corrected it. Now it refers to the definition used in the Using Administrative and Secondary Sources for Official Statistics handbook. That is exactly this definition. It is actually not the MEMOBUST definition. Was a mistake on my part...

JALinnerud commented 1 year ago

Challenge there is that Using Administrative and Secondary Sources for Official Statistics handbook (https://unece.org/fileadmin/DAM/stats/publications/Using_Administrative_Sources_Final_for_web.pdf) refers to a footnote [33] which is a broken link.... Tracing good definitions to their source is not easy!

zoltanvereczkei commented 1 year ago

Indeed... I have the strong feeling the definition is originally based on the MEMOBUST one but now the reference is gone...

InKyungChoi commented 1 year ago

Close this issue (see final decision here: https://github.com/UNECE/GSIMRevision/discussions/35)