Open pwalsh opened 7 years ago
@akariv
Can you:
This is the working draft of the v1 FDP spec: https://hackmd.io/BwNgpgrCDGDsBMBaAhtALARkWsPEE5posR8RxgAzffWfDIA=?view
This new spec removes the strict rules on 'phase'. Phase is now a ColumnType, meaning it could be used to describe an existing column of the data, or as a 'constant value' column, adding this information as meta-data. In cases where the data contains multiple measures, one per phase, users could use the denormalization option to describe the phase for each measure.
It could also be omitted completely by the users, in cases it's not applicable. There are no restrictions on the values of phases, and that's left for the users of the data to interpret based on local context.
The relevant part from the spec (although I do recommend to read the entire thing for a better narrative):
In many cases, publishers will prefer to have Approved, Modified and Executed values of a budget as separate columns, instead of duplicating the same line just to provide 3 figures. It is more readable to humans and more concise (i.e. creates a smaller file size).
In other cases, the budget figures for the current, next and after next years will appear as separate columns instead of in separate rows. This allows readers to more easily compare the budget figures across consecutive years.
In fact, we might even encounter data-set where both phase and year columns were reduced in the same way.
This practice is very common as a simple form of normalization being done on a published dataset. However, some data is lost along the way - in our examples, we've lost the 'Budget Phase' column in the former, and 'Fiscal Year' column in the latter.
We want to describe this process to allow data consumers to potentially undo it - and to the least resurrect the data that was lost in the process.
In order to do so we need to:
extraFields
property a field definition for each column that was reduced (budget phase or fiscal year in our scenario), for example:
"extraFields": [
{ "name": "Budget Phase", "type": "string", ... },
{ "name": "Fiscal Year", "type": "integer", ... },
...
]
normalize
property to each measure in the schema. The value of this property is a mapping between every 'reduced column' name to a value, for example:...
"schema": {
"fields": [
...
{
"name": "Approved 2015",
"type": "number",
"normalize": {
"Budget Phase": "approved",
"Fiscal Year": 2015
},
...
},
{
"name": "Executed 2015",
"type": "number",
"normalize": {
"Budget Phase": "executed",
"Fiscal Year": 2015
},
...
},
{
"name": "Approved 2016",
"type": "number",
"normalize": {
"Budget Phase": "approved",
"Fiscal Year": 2016
},
...
},
{
"name": "Executed 2016",
"type": "number",
"normalize": {
"Budget Phase": "executed",
"Fiscal Year": 2016
},
...
},
]
}
...
extraFields
property a field definition for the target column for the measures' values, like so:
"extraFields": [
...
{
"name": "Fiscal Amount",
"type": "number",
"columnType": "value",
"normalizationTarget": true
}
]
Description
Fiscal Data Package v0.3 has a phase concept. There are fixed phase types based on some generic idea of budget cycles, but:
The only thing that seems consistent about fiscal phases (budget phases being the clearest example), is they they are a sequence of events.
@akariv has a solution for this as a customisable sequence for Fiscal Data Package v1.0.0. We'll share a draft here.
See prior discussion here
Tasks