FranckCo / GSIM-SPAP

Generic Statistical Information Model - Statistical Program and Acquisition Program
5 stars 0 forks source link

Web Robot = Mode, not Instrument? / additional Instruments #12

Open michaeladenk opened 11 years ago

michaeladenk commented 11 years ago

Reviewing Chris' GSBPM Level 1 Objects model, the Web Scraping use case, and the "Collect Output Statistics from other Agencies" use case, I was wondering whether Web Scraping/Internet Robot is just one possible Mode of a more general "Data Harvesting" Instrument. (This is probably not yet the perfect name.) Other possible Modes could be a Web Service, FTP exchange, etc. (Or would we call this rather a Transmission Channel?)

In terms of the Instrument, why not distinguish

As register and public-use micro data (etc) are actually subtypes of Products, maybe the first-level distinction is simply between "collect your own/new data" and "harvest existing statistical products".

I know, this covers actually two different issues now

  1. robot = mode or transfer channel or instrument and
  2. what instruments should the model include
victory1805 commented 11 years ago

The existence of primary and secondary collection is covered in the description of the model (see below). But they could be mentioned explicitly in the model either as an object (not sure about this) or more likely a relationship role to Data Resource. One program's primary collecton can become another program's secondary collection.

The description of the statistical program in the model currently says "In the case of the traditional approach, an agency has received an Information Request and a set of Requirements; and has approved a Business Case. When this happens, a new Statistical Program is initiated. This Statistical Program will identify the Data Resource that it will need (existing or needing to be created). Once designed, the Statistical Program will have one or more iterations of Statistical Project, to investigate a set of characteristics for a given Population in relation to a particular time period. If the identified Data Resource is not sufficient for the purposes of the Statistical Project, an Acquisition ActivityProgram will be initiated, together with an Acquisition Project (for each instance of the time period) which will add Datasets to the Data Resource. Once this is complete, the Statistical Project will use a particular Dataset from the Data Resource to produce one or more Products or Services."

adamstatsnz commented 11 years ago

I have commented in another thread about primary/secondary and I do think this should be made as explicit as possible in the model. Getting this correct is key to getting a common understanding across agencies because this is an area that all do understand.

michaeladenk commented 11 years ago

I agree. And I think this could be in "Data Source" as commented in another thread.

michaeladenk commented 11 years ago

This seems to be solved by the current typology of Data Channel and Juan's suggestion of a generic Instrument that I'm supporting.