frictionlessdata / datapackage-fiscal

Fiscal Data Package is a lightweight and user-oriented format for publishing and consuming fiscal data. Fiscal data packages are made of simple and universal components. They can be produced from ordinary spreadsheet software and used in any environment.
https://fiscal.datapackage.org/
The Unlicense
2 stars 0 forks source link

Customisable options for phase #1

Open pwalsh opened 7 years ago

pwalsh commented 7 years ago

Description

Fiscal Data Package v0.3 has a phase concept. There are fixed phase types based on some generic idea of budget cycles, but:

The only thing that seems consistent about fiscal phases (budget phases being the clearest example), is they they are a sequence of events.

@akariv has a solution for this as a customisable sequence for Fiscal Data Package v1.0.0. We'll share a draft here.

See prior discussion here

Tasks

pwalsh commented 6 years ago

@akariv

Can you:

  1. Add here a brief description, and a sample snippet, of the new syntax/handling being proposed for v1.
  2. Additionally, link to the current text of the v1 spec in whole.
akariv commented 6 years ago

This is the working draft of the v1 FDP spec: https://hackmd.io/BwNgpgrCDGDsBMBaAhtALARkWsPEE5posR8RxgAzffWfDIA=?view

This new spec removes the strict rules on 'phase'. Phase is now a ColumnType, meaning it could be used to describe an existing column of the data, or as a 'constant value' column, adding this information as meta-data. In cases where the data contains multiple measures, one per phase, users could use the denormalization option to describe the phase for each measure.

It could also be omitted completely by the users, in cases it's not applicable. There are no restrictions on the values of phases, and that's left for the users of the data to interpret based on local context.

The relevant part from the spec (although I do recommend to read the entire thing for a better narrative):

Denormalising Measures

In many cases, publishers will prefer to have Approved, Modified and Executed values of a budget as separate columns, instead of duplicating the same line just to provide 3 figures. It is more readable to humans and more concise (i.e. creates a smaller file size).

In other cases, the budget figures for the current, next and after next years will appear as separate columns instead of in separate rows. This allows readers to more easily compare the budget figures across consecutive years.

In fact, we might even encounter data-set where both phase and year columns were reduced in the same way.

This practice is very common as a simple form of normalization being done on a published dataset. However, some data is lost along the way - in our examples, we've lost the 'Budget Phase' column in the former, and 'Fiscal Year' column in the latter.

We want to describe this process to allow data consumers to potentially undo it - and to the least resurrect the data that was lost in the process.

In order to do so we need to:

...
"schema": {
  "fields": [
     ...
   { 
      "name": "Approved 2015", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "approved",
          "Fiscal Year": 2015
      },
      ... 
   },
   { 
      "name": "Executed 2015", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "executed",
          "Fiscal Year": 2015
      },
      ... 
   },
   { 
      "name": "Approved 2016", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "approved",
          "Fiscal Year": 2016
      },
      ... 
   },
   { 
      "name": "Executed 2016", 
      "type": "number", 
      "normalize": {
          "Budget Phase": "executed",
          "Fiscal Year": 2016
      },
      ... 
   },
 ]  
}
...