OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
105 stars 15 forks source link

How to flatten when identifier is missing #295

Open Bjwebb opened 5 years ago

Bjwebb commented 5 years ago

When nested identifiers are missing in the source data, then data is pushed onto sheets without enough information about what parent it related to. (See example about sectors within transactions below). Notably it is not possible to unflatten the data back into the original shape.

This is particularly a problem for IATI, because the standard doesn't specify such identifiers for all nested objects (whereas OCDS does).

https://github.com/OpenDataServices/flatten-tool/issues/178 is the same issue, but specifically for multilingual narratives.

In https://github.com/OpenDataServices/flatten-tool/issues/177#issuecomment-474921498 @stevieflow wrote:

I noticed something via this activity

http://d-portal.org/q.xml?aid=US-GOV-1-SD-AID-FFP-G-16-00035

In each transaction there are two sectors, but from different vocabularies (which is in keeping with the standard rules, it seems)

If you flatten this, the transaction sectors are pushed to a new sheet. But - given that this particularly activity has multiple transactions, there's then no way to understand which sectors relate back to which transactions....

jpmckinney commented 5 years ago

This is an issue for OCDS, as (1) not all arrays of objects have identifiers and (2) identifiers are not always required in that context (and might be missing anyhow).

jpmckinney commented 5 years ago

The Government Transparency Institute ran into issues when OCDS data was missing identifiers (e.g. a publisher omits a required id field). Their experience was that Flatten Tool would mint new identifiers, without any feedback to the user, which made analysis more difficult.

I haven't verified that Flatten Tool has this behavior, but if so, there should be feedback to the user.

robredpath commented 5 years ago

Thanks for the feedback, @jpmckinney . Between your previous comment and this one, I think this is something that should be given some attention relatively soon. I'll add something to I&R.

robredpath commented 5 years ago

flatten-tool behaviour when ids are missing needs improving and/or documenting