pwalsh commented 7 years ago

Description

Some fiscal data concepts describe organisations. Should we, and how exactly, provide such identifiers in FDP.

See original (old) discussion

Tasks

[ ] Research

timgdavies commented 7 years ago

The old discussion link 404s (I think issue have been turned off on that old project).

I'd very much encourage use of the http://org-id.guide identifier pattern, which we're building a coalition of open data standards around. This splits identifiers into two parts:

Scheme (e.g. which company / charity / government entity register is the identifier from)
Identifier

and provides a codelist of the schemes. See more at http://org-id.guide/about

We came to this after finding the need for a pragmatic solution that can disambiguate organisation identifiers, whilst avoiding some centralised architecture.

We're building an open governance process for org-id.guide at the moment, in order to make sure the list can be both community-maintained, and government-trusted.

pwalsh commented 7 years ago

@timgdavies org-id.guide is great as a registry of identifier registries, I've been excited since I saw it launched.

I'm not sure, from what I've read, how it will help here, as the focus seems to be on being a registry for identifier lists, and not a service that resolves, say, strings that represent entities, to identifiers.

You mention this as a practical solution to disambiguation of identifiers but I'm not sure how (what am I missing?).

For example, Companies House is listed here as a source for the UK, which of course, is to be expected, but I also know from using this API in the past that there are definite issues with disambiguation of companies at the source - is there some way that org-id.guide deals with this, by providing a reconciliation layer over these various source lists of identifiers? I assume not because that would imply centralisation, although from past experience this is the major problem in using identifiers from such sources.

I guess what I am trying to understand in the context of this issue is: are you simply suggesting we adopt the identifier pattern only, or, is there additional value add from the org-id.guide service when adopting this pattern?

timgdavies commented 7 years ago

Right now org-id.guide doesn't try to do anything with resolving identifiers, so the immediate suggestion is to:

(1) Adopt the pattern of either (a) splitting identifiers into two columns: a scheme and an identifier or (b) making sure identifiers include a prefix to make clear the list the identifier is drawn from (e.g. GB-COH-01234567 to indicate a companies house number)

and

(2) Recommend use of the org-id.guide list of lists for the values of the scheme column, or the first part of a compound string.

This enables some validation that the 'list' the identifier is drawn from can be identified (e.g. If someone publishes an identifier 'XF-XYZ-01234567' but XF-XYZ is not found in the org-id.guide codelist, you can flag this, for them to either pick a better list to take identifiers from, or make sure they list they are using is documented).

Right now, that's as far as org-id.guide goes: just the list of lists.

Longer-term we're interested in seeing if we can:

Collate robust regular expressions for validating identifiers (so that, knowing the list an identifier is from, you can check it's form is correct)
Record meta-data that would signpost computers to reconciliation services (right now the meta-data records when there is data in OpenCorporates, and has URLs to the best online searches for a given list we have found, where they exist, but it does not do anything to support automated reconciliation beyond that)

So - that's a long way of saying:

Right now, it will help publishers disambiguate in the data they provide, for example, whether '1234578' is a UK Company Number (GB-COH), an Australian Business Number (AU-ABN) or a US Employer Identification Number (US-EIN).

But beyond that there's nothing else clever going on.

pwalsh commented 7 years ago

@timgdavies

Ok, we'll keep it in mind. Would you consider providing a lookup for "schemes" from org-id.guide?

Both as a spec writer, and as an implementer, I'd much rather introduce "schemes" for identifiers such as GB-COH, AU-ABN if I can get a list of such "schemes" programmatically (can be a CSV on GitHub as an MVP), and what they stand for, from a canonical source (based on open source and open data, so it can be redeployed if the canonical source ever goes down, etc.). Otherwise, in our spec, we take on some burden in terms of explaining what, say, GB-COH means, and, in implementing apps for users, maintaining a list of such "schemes" which can get out of sync with org-id.guide.

pwalsh commented 7 years ago

@timgdavies any thoughts on the above?

pwalsh commented 6 years ago

bump @timgdavies ?

pwalsh commented 6 years ago

@akariv thoughts on this for FDP v1?

akariv commented 6 years ago

In FDPv1 there's full flexibility to add new ColumnTypes to describe these sort of things - for example, the type recipient:generic:legal-entity:code-type could be used to describe the codelist from which the legal entity's code came from.

This type might be used to describe an existing column in the data (in case it exists), or, more commonly, to add a new column with a constant value containing a code list identifier.

timgdavies commented 6 years ago

Apologies I'd missed the alerts above.

org-id.guide currently has CSV and JSON download, and happy to see how we could also have a workflow or archiving those exports to other locations for long-term storage if that would help with confidence in the list.

Adding a JSON rendering of each page is also on the list for the project.

frictionlessdata / datapackage-fiscal

Organisation identifiers for fiscal data concepts #8

Description

Tasks