frictionlessdata / datapackage-fiscal

Fiscal Data Package is a lightweight and user-oriented format for publishing and consuming fiscal data. Fiscal data packages are made of simple and universal components. They can be produced from ordinary spreadsheet software and used in any environment.
https://fiscal.datapackage.org/
The Unlicense
2 stars 0 forks source link

Fiscal data record identifiers #7

Open pwalsh opened 7 years ago

pwalsh commented 7 years ago

Description

There has long been discussion around fiscal data record identifiers. Recently, we've agreed with GIFT that we can add a transaction identifier concept to potentially support linkage with Open Contracting. Some discussion here, and then in a very recent telco, reveals that the original request there for transaction identifiers may be misleading, as what is required is budget record identifiers.

And then, there is the general fact that a budget record is actually a number of measures and dimensions, each of which potentially needs unique identifiers.

We want to add identifiers as a concept, for linkage.

Tasks

jpmckinney commented 7 years ago

Modeling

In terms of modelling, the Open Contracting Data Standard models a contracting process (from planning through to implementation). It includes some budget information/links, because it is important for many use cases to reconcile contracting processes with their budgetary funding. Because budgetary information is distinct from contracting information, and because even the most granular section of a budget can fund multiple contracting processes, it makes sense to model the budgetary information outside OCDS, and to simply link to it from relevant parts of OCDS.

With respect to transactions, on the other hand, OCDS models these, because a transaction relating to a contracting process is smaller (in an information-hierarchical sense) than the contracting process it relates to, and it is straight-forward to model such transactions as part of OCDS. In other words, whether or not OFDP offers identifiers for transactions makes no difference to the use cases for OCDS, because OCDS can model transactions directly, independent of whatever choices OFDP makes.

Serialization

OCDS is serialized as JSON. Its fields (properties) for planning information are under planning in its release schema. Under planning is budget, which (among others) has these two fields that are relevant to the present discussion:

Source data

The Global Open Data Index offers links to many budgetary datasets, any of which can serve as an example of source data. Budgetary data is commonly organized as a hierarchy of programs, subprograms, projects, etc. (with terminology and depth of hierarchy varying across governments), and is commonly serialized in a tabular format. Each row within such tables are what is meant by a 'budget line item' in OCDS (though feel free to refine that definition based on reputable sources).

How budget line items from such source data are mapped into OFDP as measures or dimensions is not something I fully understand, though I would happily read a worked example that clarifies how this works. In fact, such documentation may significantly help users of OFDP and OCDS with how to model and link the two datasets.

As a quick example, this URL is an identifier for a budget line item at the level of 'proyecto' (which may not be the most granular level). It's about equitable social development relating to agriculture, so you can imagine a contracting process that awards funds to a social development agency to support minority-owned, women-owned and emerging small agricultural businesses. The use case is that, starting with the contracting process, you want to see if it was funded by a sensible budget line item (in this case it was).

jpmckinney commented 7 years ago

Tagging @juanpane @transpresupuestaria @timgdavies as this relates to prior conversations about linking to budget data from OCDS.

pwalsh commented 7 years ago

The Global Open Data Index offers links to many budgetary datasets, any of which can serve as an example of source data

@jpmckinney I guess you know that I am quite familiar with GODI and budget data in general. I'm simply asking for some example data to demonstrate the possibility of linkage between contracts and budgets, I'm not asking for examples of budget data, of which I literally have thousands.

jpmckinney commented 7 years ago

@pwalsh What I write is a reflection of my understanding, so that others may correct any confusion or misunderstanding. What I write is not some indirect commentary on your understanding.

I provide a fictitious but realistic prose example of linkage between contracts and budgets further down my last comment. I can describe it as JSON if you want it in a data format. If you want real (not just realistic) data, I can ask the people I tagged (who work directly with publishers) for a quick example.

pwalsh commented 7 years ago

@jpmckinney ok, thanks.

It seems there has really been confusion about what was originally requested here, as we discovered on our call yesterday. I'm just trying to get it clear, and I am glad we have the chance to do so together.

This confusion still exists even with, for example, the discussions from December 2016 (ref. ref.), emails I have sent via GIFT about it in the last month, and right up til our call yesterday.

So, bear with me, but I am just trying to make sure we all know what we want here.

To be clear, repeating what I've said before:

timgdavies commented 7 years ago

As a general point in standard development: We hadn't seen a field called 'contracting process identifier' in source data when we started designing OCDS, but it conceptually key to meeting user needs of tracing the full contracting process. Publishers generally have little trouble using fields inside their systems to express this latent concept. GIFT is a normative initiative, and so has a normative role in working out what a modern joined up data infrastructure for public spending should look like, and providing the frameworks for people to get to there from here.

In terms of real world use-cases and data (and the acknowledge need to get a clear conceptual understanding), this thread https://github.com/open-contracting/standard/issues/483 might prove useful - and includes data.

This slide deck prepared following meetings in December with GIFT and OKF on the fringes of the OGP may also provide useful notes on use-cases, and the conceptual relationships.

I'm curious about the idea that an FDP record represents both multiple dimensions and measures. Generally in normalised data, I would anticipate one measure with multiple dimensions. If we can unpack the degree of normalisation of a common budget record, I suspect that will really help us in identifying how to unambiguously make the budget-contract-spending linkages work.

timgdavies commented 7 years ago

Can you say more Paul about the confusion on the transaction identifier concept. Was this due to it being used in the context of budgets?

For spending from government systems, it seems to me that this should not be too tricky a concept - and many systems do have an internal or external identifier for specific spending transactions.

I understand that transaction does not apply to budgets.

pwalsh commented 7 years ago

I'm curious about the idea that an FDP record represents both multiple dimensions and measures.

I'm using "record identifiers" here to get away from the aforementioned "transaction identifiers". While a single line in source data can have multiple measures, and FDP provides "mark up" for this, such lines likely produce multiple records. We'll need to get into the details of the correct semantics, and I expect @akariv will lead on that (looks like we already see that my use of "record identifiers" is potentially misleading).

pwalsh commented 7 years ago

Can you say more Paul about the confusion on the transaction identifier concept. Was this due to it being used in the context of budgets?

Correct.

For spending from government systems, it seems to me that this should not be too tricky a concept - and many systems do have an internal or external identifier for specific spending transactions.

Definitely. Still, I've seen some examples from published UK25k spend data that tripped us up (transaction IDs not unique, no other unique identifier provided outside of the internal system, which, clearly, must have one). I'll try to dig up those examples, but handling such examples is less an issue for the standard and more for implementations.

jpmckinney commented 7 years ago

I wrote some things offline while on a flight, so please forgive any repetition of things Tim has already written.

would love to see some real world data that would potentially be "unlocked" by us doing this

I believe, in the general case, in order for third parties to link to records contained within an OFDP dataset, those records will need identifiers. OCDS may be a first use case, but it seems like anything that needs to refer to budget line items would benefit from an identifier.

There may not be much data to refer to (though Tim offered some links and others may as well), because due to the common absence of identifiers in source data, the links between datasets are not easily accomplished. A more accessible example would be from an investigative journalist manually making the links between datasets in the absence of identifiers in order to document misspending. Or, as Juan presented on his screen, a government makes the links in an internal operational system, in order to track spending against the budget line item (very common); so, ostensibly, all contracting data would be 'unlocked' for such a use case, if I understand your sense. A common use case for these identifiers within civil society would be to repeat or check that work of the government to monitor and hold it accountable.

Also, for context with regards to linking, before OCDS, most governments – if they published contracting datasets at all – published one dataset per stage (e.g. planning, tender, award, contract, implementation) without linking the datasets or using identifiers for others to do so. Linking data across datasets is sadly still fairy new, but hopefully this issue will be a step towards making it more common.

I'm satisfied with the identifier being optional, by the way (relating to some of the linked prior issues or conversations).

LindseyAM commented 7 years ago

Sharing here a couple of use cases I have heard from folks in different countries.

  1. "We want to be able to check that budget is indeed available before funds are committed (aka award a contract)" - in places where contracts are awarded when funds are not actually available, the result erodes private sector confidence (as contractors will perform work and remain unpaid or under paid for extended lengths of time). Transparently showing that budget is available (through a link between the contracting process and the budget) can help to improve this trust and efficiency.

  2. "We want to be able to check that the budget lines are being used for their intended purpose" - similarly, it may be that budgets are being used to pay for contracts unrelated to their intended purpose. A link between budgets and contracts can help folks to check that the budget is being executed properly.

Hope this is helpful.

akariv commented 7 years ago

So, would a URI like this work for fetching the information from the Mexican Federal Budget for (Year=2017, MODALIDAD="A", PP="17", RAMO="7", CAPITULO="3000", CONCEPTO="3700")?

https://openspending.org/api/3/cubes/6018ab87076187018fc29c94a68a3cd2:presupuesto-mexico-2008-20164t-2017/facts/?cut=date_2.CICLO:2017|activity_ID_MODALIDAD.ID_MODALIDAD:"A"|activity_ID_PP.ID_PP:"17"|administrative_classification_2.ID_RAMO:"7"|economic_classification_ID.ID_CAPITULO:"3000"|economic_classification_ID_2.ID_CONCEPTO:"3700"

This is what it returns:

{
  "total_fact_count": 1,
  "data": [
    {
      "expenditure_type_2.ID_TIPOGASTO": "1",
      "expenditure_type_2.DESC_TIPOGASTO": "Gasto corriente",
      "functional_classification_GPO.GPO_FUNCIONAL": "1",
      "functional_classification_GPO.DESC_GPO_FUNCIONAL": "Gobierno",
      "economic_classification_ID_4.ID_PARTIDA_ESPECIFICA": "",
      "economic_classification_ID_4.DESC_PARTIDA_ESPECIFICA": "",
      "economic_classification_ID_3.ID_PARTIDA_GENERICA": "",
      "economic_classification_ID_3.DESC_PARTIDA_GENERICA": "",
      "functional_classification_ID_2.ID_SUBFUNCION": "4",
      "functional_classification_ID_2.DESC_SUBFUNCION": "Derechos Humanos",
      "activity_ID_PP.ID_PP": "17",
      "activity_ID_PP.DESC_PP": "Derechos humanos",
      "date_2.CICLO": 2017,
      "budget_line_id_2.ID_CLAVE_CARTERA": "0",
      "activity_ID_MODALIDAD.ID_MODALIDAD": "A",
      "activity_ID_MODALIDAD.DESC_MODALIDAD": "Funciones de las Fuerzas Armadas",
      "economic_classification_ID.ID_CAPITULO": "3000",
      "economic_classification_ID.DESC_CAPITULO": "Servicios generales",
      "functional_classification_ID_3.ID_AI": "3",
      "functional_classification_ID_3.DESC_AI": "Defensa de la integridad, la independencia, la soberanía del territorio nacional y la seguridad interior",
      "fin_source_2.ID_FF": "1",
      "fin_source_2.DESC_FF": "Recursos fiscales",
      "economic_classification_ID_2.ID_CONCEPTO": "3700",
      "economic_classification_ID_2.DESC_CONCEPTO": "Servicios de traslado y viáticos",
      "functional_classification_ID.ID_FUNCION": "2",
      "functional_classification_ID.DESC_FUNCION": "Justicia",
      "administrative_classification_3.ID_UR": "139",
      "administrative_classification_3.DESC_UR": "Dirección General de Derechos Humanos",
      "geo_source_2.ID_ENTIDAD_FEDERATIVA": "9",
      "geo_source_2.ENTIDAD_FEDERATIVA": "Ciudad de México",
      "administrative_classification_2.ID_RAMO": "7",
      "administrative_classification_2.DESC_RAMO": "Defensa Nacional",
      "MONTO_EJERCIDO": null,
      "MONTO_EJERCICIO": null,
      "MONTO_ADEFAS": null,
      "MONTO_PAGADO": null,
      "MONTO_MODIFICADO": null,
      "MONTO_APROBADO": 9860000.0,
      "MONTO_DEVENGADO": null
    }
  ],
  "cell": [
    {
      "ref": "date_2.CICLO",
      "operator": ":",
      "value": [
        2017
      ]
    },
    {
      "ref": "activity_ID_MODALIDAD.ID_MODALIDAD",
      "operator": ":",
      "value": [
        "A"
      ]
    },
    {
      "ref": "activity_ID_PP.ID_PP",
      "operator": ":",
      "value": [
        "17"
      ]
    },
    {
      "ref": "administrative_classification_2.ID_RAMO",
      "operator": ":",
      "value": [
        "7"
      ]
    },
    {
      "ref": "economic_classification_ID.ID_CAPITULO",
      "operator": ":",
      "value": [
        "3000"
      ]
    },
    {
      "ref": "economic_classification_ID_2.ID_CONCEPTO",
      "operator": ":",
      "value": [
        "3700"
      ]
    }
  ],
  "fields": [
    "expenditure_type_2.ID_TIPOGASTO",
    "expenditure_type_2.DESC_TIPOGASTO",
    "functional_classification_GPO.GPO_FUNCIONAL",
    "functional_classification_GPO.DESC_GPO_FUNCIONAL",
    "economic_classification_ID_4.ID_PARTIDA_ESPECIFICA",
    "economic_classification_ID_4.DESC_PARTIDA_ESPECIFICA",
    "economic_classification_ID_3.ID_PARTIDA_GENERICA",
    "economic_classification_ID_3.DESC_PARTIDA_GENERICA",
    "functional_classification_ID_2.ID_SUBFUNCION",
    "functional_classification_ID_2.DESC_SUBFUNCION",
    "activity_ID_PP.ID_PP",
    "activity_ID_PP.DESC_PP",
    "date_2.CICLO",
    "budget_line_id_2.ID_CLAVE_CARTERA",
    "activity_ID_MODALIDAD.ID_MODALIDAD",
    "activity_ID_MODALIDAD.DESC_MODALIDAD",
    "economic_classification_ID.ID_CAPITULO",
    "economic_classification_ID.DESC_CAPITULO",
    "functional_classification_ID_3.ID_AI",
    "functional_classification_ID_3.DESC_AI",
    "fin_source_2.ID_FF",
    "fin_source_2.DESC_FF",
    "economic_classification_ID_2.ID_CONCEPTO",
    "economic_classification_ID_2.DESC_CONCEPTO",
    "functional_classification_ID.ID_FUNCION",
    "functional_classification_ID.DESC_FUNCION",
    "administrative_classification_3.ID_UR",
    "administrative_classification_3.DESC_UR",
    "geo_source_2.ID_ENTIDAD_FEDERATIVA",
    "geo_source_2.ENTIDAD_FEDERATIVA",
    "administrative_classification_2.ID_RAMO",
    "administrative_classification_2.DESC_RAMO",
    "MONTO_EJERCIDO",
    "MONTO_EJERCICIO",
    "MONTO_ADEFAS",
    "MONTO_PAGADO",
    "MONTO_MODIFICADO",
    "MONTO_APROBADO",
    "MONTO_DEVENGADO"
  ],
  "order": [

  ],
  "page": 1,
  "page_size": 20,
  "status": "ok"
}
pwalsh commented 6 years ago

@akariv

Can you:

  1. Add here a brief description, and a sample snippet, of the new syntax/handling being proposed for v1, specifically in regards to identifiers.
  2. Additionally, link to the current text of the v1 spec in whole.
akariv commented 6 years ago

This is the current FDP draft: https://hackmd.io/BwNgpgrCDGDsBMBaAhtALARkWsPEE5posR8RxgAzffWfDIA=?view

Generally speaking, the new fiscal data package is a tabular data package. As such, it holds one or more data tables, each with a schema and a defined primaryKey.

As suggested above, a mapping of {k => v for k in primaryKey} could be used as a unique row identifier, which is also somewhat resilient to some schema changes (e.g adding or removing columns). Exact means of encoding (i.e. should it be JSON? query parameters? base64? etc.) could be left for the implementors I think.

pwalsh commented 6 years ago

@akariv you might want to look at this extensive discussion https://github.com/open-contracting/standard/issues/483