GSBPM "Process" Phase - Githubissues

Boundary between collect phase and analysis phase

We suggest adding the following to point 77 (p. 20) “It is also desirable that these phases are conducted simultaneously for the benefit of collection activities.” We believe it is important to reinforce in this way the link between collection activities and microdata analysis, to clarify the interdependence among sub-processes. (Portugal)
Boundary with collect phase (France) -- It could be usefull to make a clear distinction between the population collected and the population of interest. In which sub-process are we restricted to the population of interest, 4, 5.1 or 5.5 ? -- In the case of surveys, when the checks on the questionnaires are carried during collection with our without a possible feedback to the respondents: do the checks fall under 4.3 or 5.3 ? does the correction of anomalies fall under 4.3 or 5.4 ? -- If processing, it could be usefull to clarify interactions between the sub-processes of 4 and 5.
Boundary with analysis phase (France) -- 6.1 : Aggregates from 5.7 can be statistical outputs. May be add the precision because the rest of phase 6 only use the word statisticals outputs which is defined in sub-procces 6.1 -- In that case of aggregates being the statistical outputs the sub-process is still relevant but limited to recording the quality criteria ? -- If 6.1 should be skipped because aggregates being the statistical outputs where should this precision be added ? -- It is unclear when the statistical product is micro-data, what part of the validation and control fall under phase 5 or phase 6 ? Phase 6

Importance of data integration

‘Integrate data’ being represented as a sub-phase perhaps makes it seem too small in relation to the scale of activities involved. Is there appetite for an ‘Integrate data’ phase? (UK)
If ‘Integrate data’ were made into a phase, then the Process stage should be applied both to individual data sources and the integrated sources. Rather than have two (or more) Process phases, it should be made clear that the GSBPM is not intended to be a strictly linear process. (UK)

Other general comment

GSBPM does not contain anything on training machine learning models. Should this, and similar, model design and build activities be explicitly included or described in any phases? (UK)
We suggest adding a last sentence in the point 75 on the Process phase to accommodate geospatial considerations: “Despite similar steps of processing compared to statistical variables, it is recommended that geospatial variables should be processed before other variables (e.g., geocoding), particularly for further data integration (sub-process 5.1 (Integrate data).” (Portugal)

Sub-process 5.1 Integrate data

Pseudonymisation (or anonymisation) is not mentioned in the GSBPM. In Finland, the Data Management Act (based on GDBR) requires that personal data must be pseudonymised. Statistics Finland interprets pseudonymisation as part of the sub-process 5.1. Pseudonymisation requires units to be combined with the corresponding units of the other datasets (and base register), so that multiple pseudo-identifiers are not created for the same unit and so that the unit's data from different sources is properly integrated. (Finland)

Sub-process 5.3 Review and validate

Another issue is that there is still overlapping between 5.3 and 5.4: it is not clear where the application of editing rules should be included andthere is also incoeherence with GSDEM, where Review is a specific editing function. (Italy)
5.3 Review and validate: renaming suggestion to microvalidation. (Hungary)

Sub-process 5.7. Calculate aggregates

We suggest replacing “geographic classifications” in the point 86 for “areal classifications” to cover more broadly data aggregations by areal codes (ids) from geographic, statistical, functional or administrative criteria. (Portugal)
We suggest add the following information in the point 86 “This sub-process refers to two main different finalised data files: the first one is a validated, reviewed and improved micro-data file suitable to derive new variables and units as referred in sub-process 5.5 and the second …” (Portugal)
Aggregates can also be used to review or edit metadata. May be clarify the sub-process by adding that aggregates are not just output for the analyse phase (France)

Sub-process 5.8 Finalize data files

Renaming might be needed to “data set” instead of “data files” (also: suggestion to apply the term consistently in the whole model) (Hungary)

UNECE / GSBPM_GAMSO_Revision

GSBPM "Process" Phase #12