pwalsh commented 7 years ago

Description

Fiscal Data Package v0.3 has a phase concept. There are fixed phase types based on some generic idea of budget cycles, but:

The fixed types are limited, based somehow on budgeting, and ... fixed
They don't reflect context-specific fiscal process (different names, different phases)

The only thing that seems consistent about fiscal phases (budget phases being the clearest example), is they they are a sequence of events.

@akariv has a solution for this as a customisable sequence for Fiscal Data Package v1.0.0. We'll share a draft here.

See prior discussion here

Tasks

[ ] @akariv to share draft implementation / example / wording

pwalsh commented 6 years ago

@akariv

Can you:

Add here a brief description, and a sample snippet, of the new syntax/handling being proposed for v1.
Additionally, link to the current text of the v1 spec in whole.

akariv commented 6 years ago

This is the working draft of the v1 FDP spec: https://hackmd.io/BwNgpgrCDGDsBMBaAhtALARkWsPEE5posR8RxgAzffWfDIA=?view

This new spec removes the strict rules on 'phase'. Phase is now a ColumnType, meaning it could be used to describe an existing column of the data, or as a 'constant value' column, adding this information as meta-data. In cases where the data contains multiple measures, one per phase, users could use the denormalization option to describe the phase for each measure.

It could also be omitted completely by the users, in cases it's not applicable. There are no restrictions on the values of phases, and that's left for the users of the data to interpret based on local context.

The relevant part from the spec (although I do recommend to read the entire thing for a better narrative):

Denormalising Measures

In many cases, publishers will prefer to have Approved, Modified and Executed values of a budget as separate columns, instead of duplicating the same line just to provide 3 figures. It is more readable to humans and more concise (i.e. creates a smaller file size).

In other cases, the budget figures for the current, next and after next years will appear as separate columns instead of in separate rows. This allows readers to more easily compare the budget figures across consecutive years.

In fact, we might even encounter data-set where both phase and year columns were reduced in the same way.

This practice is very common as a simple form of normalization being done on a published dataset. However, some data is lost along the way - in our examples, we've lost the 'Budget Phase' column in the former, and 'Fiscal Year' column in the latter.

We want to describe this process to allow data consumers to potentially undo it - and to the least resurrect the data that was lost in the process.

In order to do so we need to:

Add to the extraFields property a field definition for each column that was reduced (budget phase or fiscal year in our scenario), for example:
```
"extraFields": [
{ "name": "Budget Phase", "type": "string", ... },
{ "name": "Fiscal Year", "type": "integer", ... },
...
]
```
We add a normalize property to each measure in the schema. The value of this property is a mapping between every 'reduced column' name to a value, for example:

...
"schema": {
  "fields": [
     ...
   { 
      "name": "Approved 2015", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "approved",
          "Fiscal Year": 2015
      },
      ... 
   },
   { 
      "name": "Executed 2015", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "executed",
          "Fiscal Year": 2015
      },
      ... 
   },
   { 
      "name": "Approved 2016", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "approved",
          "Fiscal Year": 2016
      },
      ... 
   },
   { 
      "name": "Executed 2016", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "executed",
          "Fiscal Year": 2016
      },
      ... 
   },
 ]  
}
...

Finally we add to the extraFields property a field definition for the target column for the measures' values, like so:

"extraFields": [
...
{
"name": "Fiscal Amount",
"type": "number",
"columnType": "value",
"normalizationTarget": true
}
]

frictionlessdata / datapackage-fiscal

Customisable options for phase #1

Description

Tasks

Denormalising Measures