Standardise unique id generation on RFC 4122

facilityregistry / fred-api

Facility Registry API Documentation Website

11 stars 4 forks source link

Standardise unique id generation on RFC 4122 #33

Closed ctford closed 11 years ago

ctford commented 11 years ago

My understanding of the purpose of the id field is to provide an identity that allows facilities data to live beyond the original system it's hosted in. To achieve this, the id must:

Have no collisions, so that lists of facilities can be merged
Not be dependent on the URL, so that facilities can be hosted in a new location
Be generated in a well-understood way, so new ids can continue to be created if facilities are migrated to a new system

In #26 we resolved that the intention of the id was to be "universally unique". The spec also says that "the API does not providing a specific format for IDs".

There's a contradiction between these two ideas, because UUIDs are only unique within the scheme that created them. So two facilities registries using different UUID generation schemes could unwittingly experience collisions. So if we don't agree on how the ids are generated, we admit the possibility of collisions.

But the pressing practical reason I have for wanting to standardise on RFC 4122 is to port the generation of ids to a new system. On a project I'm involved in, facilities are being hosted in DHIS2, but will be migrated to a different system. If the scheme for id generation isn't reproducible within the new system, then how can the new system create new facilities?

By picking RFC 4122 and using its standard string representation (e.g. '5899d128-4c55-11e2-b5e1-b88d12122fdc'), we get a well-understood, widely supported and robust scheme for id generation.

Cheers,

Chris

mortenoh commented 11 years ago

-1

While the idea of having unique identifiers is nice, UUIDs are not compatible with XMLID/XMLIDREF since they can start with a number.

ctford commented 11 years ago

So if I understand correctly, @mortenoh, you're saying that using UUIDs would impede us in defining an XML FRED because we'd be blocked from using XML IDs?

mortenoh commented 11 years ago

Yes, that is a potential problem since XML IDs can not start with a number. Of course, you could use a homegrown ID matching scheme for XML (which we actually do in DHIS 2) but its not ideal.

mberg commented 11 years ago

Let's discuss on the next call.

I like the idea of a UUID.

Simple Geo which was a POI service used something like SG_UUID.

I don't think we should mandate this in the URLs but I don't see the harm of storing UUID as a required parameter.

To get around the XML problem couldn't we just preface it with something like FR_ ?

On Wed, Jan 16, 2013 at 3:39 PM, Morten Olav Hansen < notifications@github.com> wrote:

Yes, that is a potential problem since XML IDs can not start with a number. Of course, you could use a homegrown ID matching scheme for XML (which we actually do in DHIS 2) but its not ideal.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-12316717.

ctford commented 11 years ago

Next call sounds great.

Here's a mail sent by my colleague Jeff Wisnie (@jwishnie) that was sent as part of a discussion between ThoughtWorks and HISP folks. Including it for context, because he goes into detail about our thinking behind preferring a specified scheme:

I'm jumping into this with out full background (Chris has filled me but I'm sure I've missed nuances), so please correct me when I get assumptions and context wrong.

My understanding from Chris is that he is proposing that resources served by FRED have both a URL and a UUID.

I think is this very good idea because the semantics of URL are overloaded: the URL is supposed to uniquely identify a resource AND provide a means to retrieve the resource via FRED.

If we consider that different implementations may have good reason to construct their URLs differently—perhaps to make implementing retrieval easier, or to match an existing ID scheme in an existing code base (as I think is the case with DHIS2), we won't want to restrict implementations from doing this by prescribing a format for the URL.

And if we don't want to restrict the format of a URL, or the algorithm used to guarantee uniqueness, then the URLs are not UUIDs, they are unique only to the implementation that generated them. DHIS2 generated URLs could collide with ResourceMapper URLs, or any other implementations.

And let's assume that there are cases where data from multiple FRED instances (which are possibly different implementations entirely—a DHIS2 based one, a Resource Mapper based one, a completely new code base one) will be combined. I could list many, but here are a couple: Distributed FRED instances (say on a district level) are combined into one FRED instance (on a national level) A FRED instance is replaced with a new implementation (Resource Mapper FRED instance replaced by DHIS2 Fred) and data needs to be migrated Operators of FRED instances are asked to federate their information into a more global instance—e.g. Modi sets up a "FacilityHub" service and asks NGOs, governments, etc… to push info from their local registries into this shared one

If you take these two points:

URLs are unique only to the implementation that generated them

Data may be combined from multiple implementations

And you don't have a separate truly UUID, then you have two problems. Combining resources just became difficult. Because you can't be certain that URLs from different implementations won't collide, when data is imported for one instance to another either you must guarantee they are the same implementation (probably same version…) OR you have to re-map all the URLs, generating new ones unique to the receiving implementation.

This is not terrible—you may want to rewrite the URLs anyway to match the retrieval scheme used by the receiving implementation anyway.

BUT once the URL is rewritten there is now no shared understanding of resource identity between the two implementations which leads to the next problem.

SHARING resources between two two implementations is now very tricky, basically unworkable. Consider these cases: Resilient import from Imp1 and Imp2 is pretty tough. Imagine the retrieval of ALL data from Imp1 might crash part way (flakey connection, bugs etc…)—the simple strategy of just restarting the import is tough, because Imp1 and Imp2 now identify the resources differently. How do you know what info has already been inserted into imp2?

Sure, you can hold a table in Imp2 that maps original IDs to new IDs—and make that a many-to-one mapping because you might import from several places. You're getting complicated.

Running systems in parallel is pretty tough—take a field deploy with mobile devices that submit updates to FRED. To safely role out Imp2, you could run Imp1 and Imp2 in parallel, with the device submitting to Imp1 with Imp1's URL, and Imp1 passing that to Imp2—but if it's an edit on an existing resource, Imp2 won't know unless, as above, it's maintaining ID maps from EVERY instance it has met.

Even worse, when you switch off Imp1, and the phones are now making edits to Imp2 based on cached resources with Imp1's IDs, you are again hosed unless you are managing all those maps. Chris' approach is SO much simpler:

URL which uniquely identifies and retrieves a resource for a given instance

UUID which universally identifies a resource across all instances and implementations

Building is this simple. For example, DHIS2 simply has to stamp resources with an additional UUID property and pass this along when resources are retrieved. There is NO NEED to change the way URLs are generated, or any existing logic for identifying and retrieving resources.

As far as DHIS2 is concerned, the UUID can be an opaque string that it never looks at.

There is no problem importing or federating data across instances and implementations because there is always a share understanding of resource ID (the UUID) and complete freedom for implementations to use the best URL format for their needs.

Now take the cases above: Resilient import—import crashes, restart it, Imp2 checks UUIDs of imported resources, and ignores (or replaces to be safe) the ones it already has. Systems in parallel, data passed form Imp1 to Imp2 is handled as above—mobile device submits a resource with a UUID and old URL, receiving system (Imp2) can still identify the updated resource as one holds (by UUID) SO, if you buy that, there are two more questions: should the standard prescribe an algorithm and JSON representation for the UUID? if so, what should it prescribe?

On the first point, the answer must be yes because UUIDs generated by a given scheme are unique only compared to UUIDs generated by the SAME scheme. If some UUIDs are generated with a RFC4122 compliant algorithm and another with some other 128-bit has (take your pick from here: http://en.wikipedia.org/wiki/List_of_hash_functions), the RFC4122 UUIDs will be very unlikely to collide with each other , but they could well collide with a 128-bit CityHash generated UUID.

So if we don't all agree to use the SAME UUID generation scheme, it's no different than just having a URL scheme that we don't agree on.

And the whole point of having a UUID is so that we DON'T have to agree on the URLs.

As for WHICH UUID to use, seems silly NOT to use RFC4122 as they are standard and all platforms (including BlackBerry!) have good library functions to generate them. And as RFC4122 UUIDs have a standard string representation, that would be the representation for the JSON.

Here's an example: '5899d128-4c55-11e2-b5e1-b88d12122fdc'

Python's UUID library is a nice simple implementation of RFC4122, so for anyone not familiar, open a python interpreter and try:

>>> import uuid >>> str(uuid.uuid4()) '17394bfc-ac07-4ad1-8b8b-17d02179c485'

But I'm being silly, right? These are so common everyone's familiar!

mortenoh commented 11 years ago

I think modifying UUID to suit our purposes might be a bad idea, and it kinda defeats the purpose.

mberg commented 11 years ago

Think I agree.

On Tue, Jan 22, 2013 at 6:36 PM, Morten Olav Hansen < notifications@github.com> wrote:

I think modifying UUID to suit or purposes might be a bad idea. And it kinda defeats the purpose.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-12543434.

edjez commented 11 years ago

I agree!

On Jan 22, 2013, at 5:11 AM, Matt Berg notifications@github.com wrote:

Think I agree.

On Tue, Jan 22, 2013 at 6:36 PM, Morten Olav Hansen < notifications@github.com> wrote:

I think modifying UUID to suit or purposes might be a bad idea. And it kinda defeats the purpose.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-12543434.

— Reply to this email directly or view it on GitHub.

edjez commented 11 years ago

Had a discussion in the tech hangout.. we all agree this may block implementations if not resolved soon

Can the XML-ID conversation be considered closed? (ie the IDs we generate for system IDs need not conform to a standard that was put together to identify nodes within an xml doc) (I hope so)
Is there any objections deriving from testable concerns to using RFC4122 UUIDs (not customized UUIDish-like schemes)?

rowenaluk commented 11 years ago

+1 to using RFC4122 UUIDs

jwishnie commented 11 years ago

+1 to RFC4122

Jeff

On Thursday, January 31, 2013 at 9:04 AM, rowenaluk wrote:

+1 to using RFC4122 UUIDs

— Reply to this email directly or view it on GitHub (https://github.com/facilityregistry/fred-api/issues/33#issuecomment-12952815).

bobjolliffe commented 11 years ago

On 31 January 2013 16:58, edjez notifications@github.com wrote:

Had a discussion in the tech hangout.. we all agree this may block implementations if not resolved soon

Can the XML-ID conversation be considered closed? (ie the IDs we generate for system IDs need not conform to a standard that was put together to identify nodes within an xml doc) (I hope so)

You don't foresee facilities represented as elements in an xml stream? Though I can see all the recent attention is on this JSON stuff, I certainly do. And would like to have a schema which allows me to validate ID/IDREF mappings. Which I can't easily do with a uuid.

Is there any objections deriving from testable concerns to using RFC4122 UUIDs (not customized UUIDish-like schemes)?

None whatsoever. But I would continue to object to attempts to overspecify these things. Implementations should be free to use whatever form of identifier is convenient for them in this property. I have already suggested from the start that implementations can also carry any number of other identifiers, including for example uuids.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-12952510.

mortenoh commented 11 years ago

Yeah, I was under the impression that this was exactly what the identifiers block was to be used for. I'm still against this, there is nothing stopping an implementation to use UUIDs today.

Will a change to UUIDs also mean that we will be using them in the URLs? I think it was mentioned somewhere that we do not want this? or do we leave that open to the implementors?

ctford commented 11 years ago

Standardising the "id" field on a global and open identification scheme is important to include in the spec because it's crucial to interoperability. If ids are generated in a way that cannot be reproduced in other systems, facilities data lives and dies along with the system that first hosts it.

For example, it would not be possible to migrate facilities to a new system, because that system could not mint ids for newly created facilities. Also, you could not merge two lists of facilities without the potential for conflict or inconsistently structured ids.

Curated system-specific identifiers, for example DHIS2 ids, seem like excellent candidates for the "identifiers" field, as they are meaningful in a specific context.

As for including the "id" in the URL, I don't think this should be a hard requirement. It's convenient for human beings to have an intuitive URL structure but fragile if systems use URL-hacking to navigate. I'd prefer that we leave it to implementations to responsibly structure their own URLs.

If XML-IDs are a blocker to using UUIDs, perhaps we could use the URN representation of them?

edjez commented 11 years ago

Of course we see facilities represented in XML. But using XML-ID (a standard to help identify unique elements in an xml document) to guide choice of facility IDs seems like a leaky abstraction. What code would be so different between choosing identifiers that are supporting XML-ID and identifiers not supporting it? I imagine regular xpath and other work will not be significantly affected by this. I'm feeling it would be helpful to see the "red" (what is broken) before collectively suggesting approaches to make it "green" ...

jwishnie commented 11 years ago

I think there is some confusion here. There is no technical limitation to placing RFC4122 UUIDs in an XML document. They can be either content or attribute values. They can be referenced by XPath, parsed by all parsers.

As Eduardo points out, XML-ID is designed for a different purpose—specifically to identify elements within an XML formated document. That is, it is a specific detail of the XML format and not intended to be used as content. It's a but like saying all content in a Word Document must someone conform the the standards of file system names (e.g. no ":" or "/")—it doesn't make any sense.

And we certainly don't want the CONTENT provided by a facility registry to be restricted by any given FORMAT choice.

The following are both completely valid XML

efcab9bb-a164-4bdf-90c9-870ace0e7015

mortenoh commented 11 years ago

Jeff, we never said there was a technical reason for not having UUIDs in elements/attributes, of course that is allowed. We were only talking about reusing them as XML-IDs which although not a requirement, we think will be very useful, and which is why our own identifiers do always start with a char, which is a must in XML-ID [1].

I would have understood the suggestion better if we were ending up with a schema like this: { "id": "uuid-abc-123" } http://example.org/facilities/uuid-abc-123

This just seems like common sense. Of course, UUIDs look horrible in URLs and as far as I know are not very common to use as part of the URL.

Another approach would be to have a special place for this in the identifiers block, but I don't think we allow for searching for alternative identifiers today, but the spec should probably be extended to allow this.

[1] http://www.w3.org/TR/REC-xml/#NT-Name

mberg commented 11 years ago

After discussing at length with Chris, here is a proposal he drafted we'd like to propose:

The "id" field MUST be a UUID generated according to RFC 4122, using the standard string representation e.g. "f81d4fae-7dec-11d0-a765-00a0c91e6bf6".
The "id" MAY be supplied by the client at time of creation. If not provided, the server MUST generate the "id".
Other identification schemes e.g. system-specific identifiers or officially curated identifiers, MAY be supported through the "identifiers" field.
The "url" MUST be a valid URL generated by the server at time of creation. It MAY contain part or all of the "id", or part or all of another identifier from the "identifiers" field.
If there is sufficient interest for the "id" field to be used directly as an xml:id, then we should consider representing the UUID as a URN (again, according to RFC 4122) e.g. "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

Note: this includes making the relationship between "id" and "url" non-strict because managing application's URL-space has different requirements than managing global identity. This would allow the individual facility URL exposed by DHIS2 to be built using the DHIS2 identifier (which may also be included in "identifiers").

Question - If we go with this. Will need to talk through how this changes the specification of the REST endpoints.

bobjolliffe commented 11 years ago

Hi Matt

I've had some discussion with DHIS2 folk around this. I think we all agree that it can be a good thing to mandate a uuid.

Though we also feel that where uuids tend to become really useful is less in the daily transaction of data but more in merging and resolution of conflicts. For everything else it is normal to use a saner system identifier.

We would very much prefer that the standard did not mandate implementations to use uuid for this field and that it remain implementation specific.

I don't think we can support this proposal before discussing the question you pose at the end regarding changes to the endpoints. Can you elaborate it further?

As a general observation I think there is a tension in all standards between over specifying (which might apparently lead to greater interoperability but also fewer implementations) and under specifying (which can leave some interoperability questions unanswered, but significantly increase the scope of systems which might interoperate). I still lean determinedly towards the latter.

Regards Bob

On 3 February 2013 22:37, Matt Berg notifications@github.com wrote:

After discussing at length with Chris, here is a proposal he drafted we'd like to propose:

The "id" field MUST be a UUID generated according to RFC 4122, using the standard string representation e.g. "f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

The "id" MAY be supplied by the client at time of creation. If not provided, the server MUST generate the "id".

Other identification schemes e.g. system-specific identifiers or officially curated identifiers, MAY be supported through the "identifiers" field.

The "url" MUST be a valid URL generated by the server at time of creation. It MAY contain part or all of the "id", or part or all of another identifier from the "identifiers" field.

If there is sufficient interest for the "id" field to be used directly as an xml:id, then we should consider representing the UUID as a URN (again, according to RFC 4122) e.g. "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

Note: this includes making the relationship between "id" and "url" non-strict because managing application's URL-space has different requirements than managing global identity. This would allow the individual facility URL exposed by DHIS2 to be built using the DHIS2 identifier (which may also be included in "identifiers").

Question - If we go with this. Will need to talk through how this changes the specification of the REST endpoints.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13056987.

jwishnie commented 11 years ago

Hi Bob, some comments inline below.

Though we also feel that where uuids tend to become really useful is less
in the daily transaction of data but more in merging and resolution of
conflicts. For everything else it is normal to use a saner system
identifier.
I agree with the first part of the sentence, that UUIDs are really useful in merging and conflict resolution.

As for the second part, I think this depends entirely on the implementation. Many systems use UUIDs of some form as a normal system identifier (document IDs in Couch, commit ID's in Git etc…)

But I can certainly see some implementations not wanting to use them as the internal identifier, which is fine.

Mandating them in the standard would only require that they be attributes of any entity (e.g. Facility) shared over the API.

Implementations would only have to:

Generate a UUID attribute for any entity they create
Maintain (not strip) the UUID for any entity that is passed (submitted) to them

This can be completely opaque to the implementation and just stored, for example, as a string of 'user data' in an attribute that the implementation never bothers to interpret.

We would very much prefer that the standard did not mandate implementations
to use uuid for this field and that it remain implementation specific.
I'm a little confused by this statement. I think I'm not understanding it completely—it the standard doesn't mandate a given format and generation scheme for UUIDs, and each implementation can choose its own, then they have no value in resolving conflicts and merges, because they won't be universal among implementations.

Maybe I'm misunderstanding the intent behind the attribute?

Let me see if this clarifies—what an implementation uses for its internal system identifier should be completely up to the implementation. It never has to share this identifier externally, though it could choose to embed it in the , but external systems would never need to know that the contains a system id, they don't care as long as the always retrieves the referenced entity.

UUIDs, on the other hand, would need to be know externally so that systems can handle merging, conflict resolution, and federation of data.

Mandating the format of externally shared UUIDs is actually a pre-requisite to implementations being able to have their own internal system IDs for which there is no mandate at all.

I don't think we can support this proposal before discussing the question
you pose at the end regarding changes to the endpoints. Can you elaborate
it further?
Am interested as well!

As a general observation I think there is a tension in all standards
between over specifying (which might apparently lead to greater
interoperability but also fewer implementations) and under specifying
(which can leave some interoperability questions unanswered, but
significantly increase the scope of systems which might interoperate). I
still lean determinedly towards the latter.
While I generally agree with this, in my original email I laid out several real use cases which we can already foresee for which a shared definition of UUID is required. So in this case, I'm a firm believer that UUIDs need to be in the standard for interoperability.

And in discussion with the team working on the Uganda implementation, it's clear that this is needed immediately for the first deploy.

One final point, I'm also not clear on the argument against RFC4122 UUIDs: They are industry wide standards There are libraries for their easy generation on all platforms Their adoption frees implementations to keep internal system IDs internal so that interoperability does not impinge on implementation choices They ease federation, merging, conflict resolution enormously

The only negative I see is that they require implementations that might not otherwise use them to add ONE field to their internal representation of a facility. But that's really not difficult, is it?

Regards
Bob

On 3 February 2013 22:37, Matt Berg <notifications@github.com (mailto:notifications@github.com)> wrote:

After discussing at length with Chris, here is a proposal he drafted we'd
like to propose:

The "id" field MUST be a UUID generated according to RFC 4122, using
the standard string representation e.g.
"f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

The "id" MAY be supplied by the client at time of creation. If not
provided, the server MUST generate the "id".

Other identification schemes e.g. system-specific identifiers or
officially curated identifiers, MAY be supported through the "identifiers"
field.

The "url" MUST be a valid URL generated by the server at time of
creation. It MAY contain part or all of the "id", or part or all of another
identifier from the "identifiers" field.

If there is sufficient interest for the "id" field to be used
directly as an xml:id, then we should consider representing the UUID as a
URN (again, according to RFC 4122) e.g.
"urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

Note: this includes making the relationship between "id" and "url"
non-strict because managing application's URL-space has different
requirements than managing global identity. This would allow the individual
facility URL exposed by DHIS2 to be built using the DHIS2 identifier (which
may also be included in "identifiers").

Question - If we go with this. Will need to talk through how this changes
the specification of the REST endpoints.

—
Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13056987.

— Reply to this email directly or view it on GitHub (https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13130479).

bobjolliffe commented 11 years ago

Hi Jeff

On 6 February 2013 02:17, Jeff Wishnie notifications@github.com wrote:

Hi Bob, some comments inline below.

Though we also feel that where uuids tend to become really useful is less in the daily transaction of data but more in merging and resolution of conflicts. For everything else it is normal to use a saner system identifier. I agree with the first part of the sentence, that UUIDs are really useful in merging and conflict resolution.

As for the second part, I think this depends entirely on the implementation. Many systems use UUIDs of some form as a normal system identifier (document IDs in Couch, commit ID's in Git etc…)

But I can certainly see some implementations not wanting to use them as the internal identifier, which is fine.

Mandating them in the standard would only require that they be attributes of any entity (e.g. Facility) shared over the API.

Implementations would only have to:

Generate a UUID attribute for any entity they create

Maintain (not strip) the UUID for any entity that is passed (submitted) to them

This can be completely opaque to the implementation and just stored, for example, as a string of 'user data' in an attribute that the implementation never bothers to interpret.

We would very much prefer that the standard did not mandate implementations to use uuid for this field and that it remain implementation specific. I'm a little confused by this statement. I think I'm not understanding it completely—it the standard doesn't mandate a given format and generation scheme for UUIDs, and each implementation can choose its own, then they have no value in resolving conflicts and merges, because they won't be universal among implementations.

Maybe I'm misunderstanding the intent behind the attribute?

Let me see if this clarifies—what an implementation uses for its internal system identifier should be completely up to the implementation. It never has to share this identifier externally, though it could choose to embed it in the , but external systems would never need to know that the contains a system id, they don't care as long as the always retrieves the referenced entity.

UUIDs, on the other hand, would need to be know externally so that systems can handle merging, conflict resolution, and federation of data.

Mandating the format of externally shared UUIDs is actually a pre-requisite to implementations being able to have their own internal system IDs for which there is no mandate at all.

I don't think we can support this proposal before discussing the question you pose at the end regarding changes to the endpoints. Can you elaborate it further? Am interested as well!

As a general observation I think there is a tension in all standards between over specifying (which might apparently lead to greater interoperability but also fewer implementations) and under specifying (which can leave some interoperability questions unanswered, but significantly increase the scope of systems which might interoperate). I still lean determinedly towards the latter. While I generally agree with this, in my original email I laid out several real use cases which we can already foresee for which a shared definition of UUID is required. So in this case, I'm a firm believer that UUIDs need to be in the standard for interoperability.

And in discussion with the team working on the Uganda implementation, it's clear that this is needed immediately for the first deploy.

One final point, I'm also not clear on the argument against RFC4122 UUIDs: They are industry wide standards There are libraries for their easy generation on all platforms Their adoption frees implementations to keep internal system IDs internal so that interoperability does not impinge on implementation choices They ease federation, merging, conflict resolution enormously

The only negative I see is that they require implementations that might not otherwise use them to add ONE field to their internal representation of a facility. But that's really not difficult, is it?

We are not against adding one field.

We want to hear what Matt's thinking is around the individual facility endpoints .. contrary to your extensive "clarification" above, there is currently an assumption that these DO make use of the field. This has been the assumption since the very first draft of this spec. Clearly you would like to change that too but let us look at all the dependent implications together.

Bob

Regards Bob

On 3 February 2013 22:37, Matt Berg <notifications@github.com (mailto: notifications@github.com)> wrote:

After discussing at length with Chris, here is a proposal he drafted we'd like to propose:

The "id" field MUST be a UUID generated according to RFC 4122, using the standard string representation e.g. "f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

The "id" MAY be supplied by the client at time of creation. If not provided, the server MUST generate the "id".

Other identification schemes e.g. system-specific identifiers or officially curated identifiers, MAY be supported through the "identifiers" field.

The "url" MUST be a valid URL generated by the server at time of creation. It MAY contain part or all of the "id", or part or all of another identifier from the "identifiers" field.

If there is sufficient interest for the "id" field to be used directly as an xml:id, then we should consider representing the UUID as a URN (again, according to RFC 4122) e.g. "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

Note: this includes making the relationship between "id" and "url" non-strict because managing application's URL-space has different requirements than managing global identity. This would allow the individual facility URL exposed by DHIS2 to be built using the DHIS2 identifier (which may also be included in "identifiers").

Question - If we go with this. Will need to talk through how this changes the specification of the REST endpoints.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13056987>.

— Reply to this email directly or view it on GitHub ( https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13130479).

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13163740.

jwishnie commented 11 years ago

We are not against adding one field.

We want to hear what Matt's thinking is around the individual facility
endpoints .. contrary to your extensive "clarification" above, there is
currently an assumption that these DO make use of the field. This has
been the assumption since the very first draft of this spec. Clearly you
would like to change that too but let us look at all the dependent
implications together.
Ok, thanks, I need to go through that (or be led through it)—in theory at least, a UUID wouldn't have to be part of any end points—that could be a system specific identifier, as long as the UUID is a content field on the entity body (JSON or XML returned).

thanks!

Jeff

Bob

Regards
Bob

On 3 February 2013 22:37, Matt Berg <notifications@github.com (mailto:notifications@github.com) (mailto:
notifications@github.com (mailto:notifications@github.com))> wrote:

After discussing at length with Chris, here is a proposal he drafted
we'd
like to propose:

The "id" field MUST be a UUID generated according to RFC 4122, using
the standard string representation e.g.
"f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

The "id" MAY be supplied by the client at time of creation. If not
provided, the server MUST generate the "id".

Other identification schemes e.g. system-specific identifiers or
officially curated identifiers, MAY be supported through the
"identifiers"
field.

The "url" MUST be a valid URL generated by the server at time of
creation. It MAY contain part or all of the "id", or part or all of
another
identifier from the "identifiers" field.

If there is sufficient interest for the "id" field to be used
directly as an xml:id, then we should consider representing the UUID
as a
URN (again, according to RFC 4122) e.g.
"urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

Note: this includes making the relationship between "id" and "url"
non-strict because managing application's URL-space has different
requirements than managing global identity. This would allow the
individual
facility URL exposed by DHIS2 to be built using the DHIS2 identifier
(which
may also be included in "identifiers").

Question - If we go with this. Will need to talk through how this
changes
the specification of the REST endpoints.

—
Reply to this email directly or view it on GitHub<
https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13056987>.

—
Reply to this email directly or view it on GitHub (
https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13130479).

—
Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13163740.

— Reply to this email directly or view it on GitHub (https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13172319).

jason-p-pickering commented 11 years ago

Have been following this discussion with interest. We argued in in the WHO master [http://www.who.int/healthinfo/systems/WHO_CreatingMFL_draft.pdf] facility list draft paper that while UUIDs were technically superior, they may not be appropriate in all situations. Some countries prefer to maintain their own codes, for their own reasons, which include them being human readable, which is actually to some degree quite important in paper based systems. Thus the requirement for a national system (where collisions will not occur if there is central body generating IDs and they have their own mechanism for doing this) as opposed to a universal system where the chance of collision must be less than 1 in 3.4e38. Considering there are approximately 4.364357e+19 DHIS2-type UIDs, it would seem quite enough to comfortably identify all health facilities which might exist in the entire world. Of course, the chance of a collision is higher, but is it a risk we are willing to accept? Well in a national setting, the chance of a DHIS2 UID collision is sort of acceptable apparently, as the developers have decided to go this route. For a global system, perhaps those extra IDs are actually required. It all seems like a bit of an academic discussion to me.

At the end of the day, many countries are going to decide (regardless of what the wonks say) what their ID scheme is going to be, and this will often be the one which really matters. If it is unique within "the system" (and that system may be national) then would it not be acceptable? Must a country go from no IDs to a UUID when perhaps they are not ready for it?

Better to leave the spec as a unique ID within the system itself. For national systems, a sequential integer may be enough. For global systems, a UUID may be required, but imposing this requirement on countries is (I feel) going to restrict the widespread adoption of this API.

Regards, Jason

ctford commented 11 years ago

Hi Jason,

Thanks for the link. I think the tradeoffs section (Table 2) does a good job of succinctly comparing different types of identifiers (integers, UUIDs and codes).

Obviously, there's a tension between the needs of an identifier that can be easily processed by software (e.g. can be safely generated in a distributed way) and one used by humans (e.g. filled out on forms). You might even want multiple human-friendly identifiers, for example a standardised code used by the Ministry of Health, an abbreviated code used when submitting health reports via SMS and a code used by the private organisation that runs the facility.

FRED supports both an "id" field that can be used to keep track of facilities even when their URL changes, as well as a list of additional "identifiers" that can have a specified context and agency.

If I were implementing a "Master Health Facility List" using FRED, I would use a UUID for the "id" field for the purposes of interaction between automated systems, and put an entry in the "identifier" field for whatever code might have been assigned by the government. That would also allow me to create a facility in my FRED registry before the government code has been assigned.

In my view, that gets around imposing UUIDs on a country, because the UUIDs are managed by and for the benefit of the systems that hold the facilities, and the government can administer its own set of identifiers.

Cheers,

Chris

On 6 February 2013 22:25, jason-p-pickering notifications@github.comwrote:

Have been following this discussion with interest. We argued in in the WHO master [http://www.who.int/healthinfo/systems/WHO_CreatingMFL_draft.pdf] facility list draft paper that while UUIDs were technically superior, they may not be appropriate in all situations. Some countries prefer to maintain their own codes, for their own reasons, which include them being human readable, which is actually to some degree quite important in paper based systems. Thus the requirement for a national system (where collisions will not occur if there is central body generating IDs and they have their own mechanism for doing this) as opposed to a universal system where the chance of collision must be less than 1 in 3.4e38. Considering there are approximately 4.364357e+19 DHIS2-type UIDs, it would seem quite enough to comfortably identify all health facilities which might exist in the entire world. Of course, the cha nce of a collision is higher, but is it a risk we are willing to accept? Well in a national setting, the chance of a DHIS2 UID collision is sort of acceptable apparently, as the developers have decided to go this route. For a global system, perhaps those extra IDs are actually possible. It all seems like a bit of an academic discussion to me.

At the end of the day, many countries are going to decide (regardless of what the wonks say) what their ID scheme is going to be, and this will often be the one which really matters. If it is unique within "the system" (and that system may be national) then would it not be acceptable? Must a country go from no IDs to a UUID when perhaps they are not ready for it?

Better to leave the spec as a unique ID within the system itself. For national systems, a sequential integer may be enough. For global systems, a UUID may be required, but imposing this requirement on countries is (I feel) going to restrict the widespread adoption of this API.

Regards, Jason

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13200076.

mberg commented 11 years ago

Sorry for the delay in responding to this issue.

As I mentioned earlier, if we use a UID as the ID then we will probably need to modify the way our current API is specified if we don't want the full uid to show up in the rest end points.

Currently, we have:

/facilities/{id}.json

What we could propose is:

/facilities/{identifier}.json

Where the identifier has to match up with one of the id's in the identifiers block.

"identifiers": [ { "agency": "MOH", "context": "DHIS", "id": "123" }, { "agency": "UNICEF", "context": "mtrac", "id": "53adf" } ],

So in this case either 123 or 53adf would work. Alternatively, we could specify one of the identifier types to be the one used in the url.

If we allow support for both, we have to state that there can't be any id collisions between the different ID schemes (might be tricky).

Another option would be to do something like:

/facilities/{context}/{identifier}.json

We still would need to decide how the URL would be represented.

What's nice about this is we could in theory also support the following where id = uuid if we thought this format would be easier for those programmatically It might make sense to limit the number of end points.

/facilities/{id}.json

Thoughts? Does this address everyone's concerns?

Thanks,

Matt

jason-p-pickering commented 11 years ago

Hi Matt, Thanks for the feedback.

Personally, I am fine with this, as long at the UUIDs not a required identifier. Should a particular implementation choose to use them as an ID scheme, something like would be OK.

"identifiers": [ { "agency": "MOH", "context": "uuid", "id": "1c685b03-5b1f-4925-a63a-b6f8ffad8c51" }, "agency": "MOH", "context": "DHIS", "id": "M7ybiY6Ms14S" } ]

Then they could be easily addressable via one of the schemes which you describe.

However, these UUIDs and UIDs were generated in R (i.e. not with Java, C or Python libraries for this purpose). They do not conform to the RFC 4122 or DHIS2's implementation of the UID generation, although they could be guaranteed to be internally unique within a particular system through internal business logic of that particular implementation. Should another system (such as a global system) require truly RFC 4122 compliant UUIDs, then it would be up to that system to implement those for its system requirements.

In summary, the imposition of either the need for a UUID, or the need to be RFC 4122 compliant if they are generated, should not be imposed by the API, at least in my opinion.

Regards, Jason

jwishnie commented 11 years ago

While it certainly makes sense for implementations to be able to use whatever locally-unique identifier they choose—and this is supported by having multiple identifiers, in order for the standard to support federation or merging of data across implementations, the API needs to define a shared concept of an identifier.

That is, a single required identifier, with any number of optional ones specific to any local implementation or local context.

Note that I said "in order to support federation or merging of data"—if this is not a goal of the standard (as represented by the API and the JSON/XML data formats), then it can be left out.

Though I think that it would be a great shame and missed opportunity to not have this as a goal. What really is the point of a standard for a shared registry that does not effectively support data federation and merging across implementations?

As fr ID's generated in R, I'm not sure if there is an RFC4122 implementation in R, but it certainly would not be difficult to write—probably wouldn't even need to write them as there are C and C++ libraries that could be integrated as R extensions (I don't know R, so apologies for showing my ignorance, but a little Googling pulled up a number of tools for integrating C code into R).

Jeff Wishnie

On Sunday, February 10, 2013 at 7:46 AM, jason-p-pickering wrote:

Hi Matt, Thanks for the feedback.
Personally, I am fine with this, as long at the UUIDs not a required identifier. Should a particular implementation choose to use them as an ID scheme, something like would be OK. "identifiers": [ { "agency": "MOH", "context": "uuid", "id": "1c685b03-5b1f-4925-a63a-b6f8ffad8c51" }, "agency": "MOH", "context": "DHIS", "id": "M7ybiY6Ms14S" } ]
Then they could be easily addressable via one of the schemes which you describe.
However, these UUIDs and UIDs were generated in R (i.e. not with Java, C or Python libraries for this purpose). They do not conform to the RFC 4122 or DHIS2's implementation of the UID generation, although they could be guaranteed to be internally unique within a particular system through internal business logic of that particular implementation. Should another system (such as a global system) require truly RFC 4122 compliant UUIDs, then it would be up to that system to implement those for its system requirements.
In summary, the imposition of either the need for a UUID, or the need to be RFC 4122 compliant if they are generated, should not be imposed by the API, at least in my opinion.
Regards, Jason

— Reply to this email directly or view it on GitHub (https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13351575)..

jason-p-pickering commented 11 years ago

Hi Jeff,

Of course implementation in R of RFC 4122 is trivial, but I did not do it in this case to illustrate a point. The problem with what we are discussing now is that the FRED API has no way to verify that my pseudo-UUID is RFC-4122 compliant or isn't. It appears to be a UUID, but in fact isn't. The implementation details are transparent to the API according to the curent spec, so unless there is some verification of each and every implementation of the API, there is really no way to determine whether or not a specific implementation is compliant or isn't. Thus, if FRED were to require RFC 4122 for UUIDs, it would somehow need a mechanism to ensure compliance with the standard, which seems outside of the context of what FRED wants to achieve and technically challenging to realize.

This is why unique IDs within a context are critical and ultimately the only thing which is really enforceable in a federated system. The goal of a national system may be very different than a universal one. Thus the mechanisms to generate identifiers may be very different. Not that I do not think that UUIDs are technically superior, only that they are not appropriate in every situation. But we can demand that within a particular context that the IDs are unique. How they are generated and guaranteed to be unique however, is up to the context of implementation.

Regards, Jason

jwishnie commented 11 years ago

Of course implementation in R of RFC 4122 is trivial, but I did not do it in this case to illustrate a point. The problem with what we are discussing now is that the FRED API has no way to verify that my pseudo-UUID is RFC-4122 compliant or isn't.

Thank you for spelling it out, I did miss your point.

What you say is true, but the un-proveability hasn't be an issue in practice—for example, GUIDs for COM objects (sorry, showing my age.. I suffered through that era of Windows programming). That is, while it's possible for someone to claim they are compliant and to not use a compliant library, or to incorrectly implement the algorithms, it's unlikely in practice.

And by the same token, very little in the API is provably compliant—externally, how can you prove that an implementation correctly returns all facilities when "GET /facilities.json" is requested?

At the core, we have to trust that implementations that say they are compliant are, yes?

This is why unique IDs within a context are critical and ultimately the only thing which is really enforceable in a federated system.

I think this is where we disagree. I think a universal definition of a UUID—essentially the 'id' property that FRED defines, allows meaningful federation without the need for an impractical many-to-many x-mapping of context specific IDs.

Context specific ID's are important (critical) for holding foreign keys to non-FRED compliant systems, and for FRED clients to store what from their perspective is a primary key in a FRED complain central registry.

But a universal ID as a core property of FRED entities allows the practical federation of information across FRED compliant registries in a way x-mapping of context specific IDs won't.

The goal of a national system may be very different than a universal one. Thus the mechanisms to generate identifiers may be very different.

Agreed that any external-to-FRED used/visible IDs are very likely to be different and need to be. But I am referring to a common scheme used internally and shared among FRED-compliant registries (servers).

We may just be considering different usage patterns.

I am thinking, for example, about how Uganda is likely to achieve a reasonably rich and accurate shared facility registry—I don't think it will be through top-down organization the way Rwanda is likely to achieve it. I think it will evolve out of many bi-lateral sharing agreements—UNICEF will agree to share information with some Makerere U projects, together they will approach the MoH to share their view with them. Maybe WV can be brought in.

But I bet it will be an evolving aggregation of players and systems, not a centrally planned model as looks to be happening in Rwanda.

In that world—if various players use FRED compliant systems for their own projects, and those systems share an internal ID scheme, federation is fairly simple (though finding and merging duplicates is always a problem). If each system has independent context-dependent ID schemes, it's a lot harder, and probably not practical to keep the systems in a live federation. Maybe there can be a one time re-mapping and re-definition, but even that is not terribly practical.

Jeff

jason-p-pickering commented 11 years ago

think this is where we disagree. I think a universal definition of a UUID—essentially the 'id' property that FRED >defines, allows meaningful federation without the need for an impractical many-to-many x-mapping of context >specific IDs.

In fact, I do not think we differ significantly. . The system requirements for Uganda require a UUID, in that it IS a federated system and thus UUIDs are required to form the national system effectively. Totally agree with you here. However, what I am really trying to get at is that the UUIDs are simply not required in a place like the Solomon Islands, with some 300 health facilities. They also need a facility registry and have their own coding mechanism. FRED would be useful there as well for centralizing other systems, but since the IDs are generated from a single authority, there is no need to use UUIDs, although in fact they could be useful in the future should the systems requirements change, and of course, they could be implemented as part of a new national standard at that point in time.

There is just a subtle difference between requiring a UUID, as opposed to strongly recommending one in my opinion or even stating when they are required depending on the overall system architecture. If the system (i.e. the health facility registry in Uganda) needs them, then they should be used, and the FRED API should support it. But in other systems(meaning health systems and not necessarily software) , which are not federated, it may not be a requirement at all.

Regards, Jason

jwishnie commented 11 years ago

Hi all,

I won't be able to make tomorrow's call, I'm on a plane at that time :-(

I think this has been a good discussion and i hope the call attendees are able to bring it to a conclusion.

I'm make a final comment in response to Jason—I understand the subtle difference you are drawing. I think I come from a slightly different perspective from you, which is why I come down on the other side of the question, that is, that I feel UUID is an important component of a standard to support federation of data.

Take the Uganda case—I don't think there is an authority that can or will say that for Uganda people should use UUIDs. There won't be an edict from the MoH, or any way to enforce such a thing.

But I do think that a subset of well meaning individuals and organizations will, when doing their independent projects, see potential and value in implementing their projects in such a way as to make collaboration and information sharing easier.

These well meaning groups will naturally look to standards. They won't all be terribly technical, but they may well see something like FRED, or DHIS2 (w/FRED support) or Resource Mapper (w/FRED support) and choose to use a FRED-compliant system in the hopes that this will make it easier to work with other organizations that also use FRED-compliant systems.

But if UUID is not part of the standard, and different systems implement different ID schemes, that won't be true. Picking FRED for yourself, will not make it straightforward or easy to cooperate with future unknown partners who have also picked FRED, but whose implementations use a different ID scheme.

To summarize—I'm thinking of emergent cooperation scenarios, not centrally planned ones. And I don't see emergent ones working out very well if the core issue of Identity is not part of the standard.

There may be some compromised way to settle these—a few thoughts: The standard could state that any implementations that wish to share data or participate in real-time federation must implement UUID. Systems that intend to be standalone, closed, or otherwise some sort of walled-garden are free to use whatever scheme they wish that is unique within their walled-garden.

Another approach would be to make it a stated recommendation and best practice to use UUIDs, but not a requirement. In this case, the standard could request that implementations state whether they are FRED-UUID or FRED-other-ID somewhere in their documentation. I don't think this is as good as making it a core part of the standard, but it is a compromise.

That compromise is maybe a bit like the CouchDB community—Couch by default generates RFC4122 UUIDs and Couch documentation recommends that clients who wish to generate their own IDs use RFC4122 UUIDs. CouchDB even implements a REST call where the server will generate and pass RFC4122 UUIDs to clients which the clients can then use (pass back) when creating new entities/documents.

That is, they go through a lot of trouble to strongly encourage UUIDs and they even have a bunch of functionality to make it super-easy to use them (easier than not using them), but it is open. Clients can use whatever ID scheme they want. But you really need a very good reason to go against the grain. best,

Jeff Wishnie

On Tuesday, February 12, 2013 at 12:09 AM, jason-p-pickering wrote:

think this is where we disagree. I think a universal definition of a UUID—essentially the 'id' property that FRED >defines, allows meaningful federation without the need for an impractical many-to-many x-mapping of context >specific IDs.

In fact, I do not think we differ significantly. . The system requirements for Uganda require a UUID, in that it IS a federated system and thus UUIDs are required to form the national system effectively. Totally agree with you here. However, what I am really trying to get at is that the UUIDs are simply not required in a place like the Solomon Islands, with some 300 health facilities. They also need a facility registry and have their own coding mechanism. FRED would be useful there as well for centralizing other systems, but since the IDs are generated from a single authority, there is no need to use UUIDs, although in fact they could be useful in the future should the systems requirements change, and of course, they could be implemented as part of a new national standard at that point in time.
There is just a subtle difference between requiring a UUID, as opposed to strongly recommending one in my opinion or even stating when they are required depending on the overall system architecture. If the system (i.e. the health facility registry in Uganda) needs them, then they should be used, and the FRED API should support it. But in other systems(meaning health systems and not necessarily software) , which are not federated, it may not be a requirement at all.
Regards, Jason

— Reply to this email directly or view it on GitHub (https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13422006).

jason-p-pickering commented 11 years ago

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of the facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

mberg commented 11 years ago

Great discussion.

If anyone, has any feedback on my last proposal that would really be great. I think it approaches both concerns. We do need to come to an agreement on this soon as we have teams trying to implement this now.

Thanks,

Matt

On Thu, Feb 14, 2013 at 6:23 PM, jason-p-pickering <notifications@github.com

wrote:

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of t he facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555044.

bobjolliffe commented 11 years ago

Matt I've already commented at some length so won't do it again now. In light of all that has been said (and I seriously don't mean just by me!) I'd like to put forward an amended or counter- proposal, but don't think I will do it in time for the call in 5 minutes.

My understanding is that the agenda for the upcoming call is the 1.0 documentation.

Regards Bob

On 14 February 2013 15:25, Matt Berg notifications@github.com wrote:

Great discussion.

If anyone, has any feedback on my last proposal that would really be great. I think it approaches both concerns. We do need to come to an agreement on this soon as we have teams trying to implement this now.

Thanks,

Matt

On Thu, Feb 14, 2013 at 6:23 PM, jason-p-pickering < notifications@github.com

wrote:

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of t he facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555044>.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555242.

mberg commented 11 years ago

Bob not to the last proposal I sent I believe.

On Thu, Feb 14, 2013 at 6:54 PM, bobjolliffe notifications@github.comwrote:

Matt I've already commented at some length so won't do it again now. In light of all that has been said (and I seriously don't mean just by me!) I'd like to put forward an amended or counter- proposal, but don't think I will do it in time for the call in 5 minutes.

My understanding is that the agenda for the upcoming call is the 1.0 documentation.

Regards Bob

On 14 February 2013 15:25, Matt Berg notifications@github.com wrote:

Great discussion.

If anyone, has any feedback on my last proposal that would really be great. I think it approaches both concerns. We do need to come to an agreement on this soon as we have teams trying to implement this now.

Thanks,

Matt

On Thu, Feb 14, 2013 at 6:23 PM, jason-p-pickering < notifications@github.com

wrote:

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of t he facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

— Reply to this email directly or view it on GitHub<

https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555044>.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555242>.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13558365.

ctford commented 11 years ago

I don't think I commented on the most recent mail you sent either Matt.

As for the URL issue, I think we're better off not mandating a URL format. I think that gets tricky, and makes it more difficult for systems to implement FRED in a way that's natural to them. In DHIS2 for example, facilities already have URLs based around DHIS2 ids, and to me it makes sense that they remain that way, even in a DHIS2 that uses UUIDs for facilities. URL templating is nice for playing around with an API, but I think it makes things complicated when we're talking about interlinking systems, which is the use case I'm particularly focussed on.

If we do want to leave implementations free to choose id schemes (I've already given my opinion on that :-), then I think that we need to avoid mandating a direct connection between URL and id. Apart from anything else, there are many possible id formats that would be awkward or have to be encoded to fit in URLs.

Cheers,

Chris

On 14 February 2013 19:03, Matt Berg notifications@github.com wrote:

Bob not to the last proposal I sent I believe.

On Thu, Feb 14, 2013 at 6:54 PM, bobjolliffe notifications@github.comwrote:

Matt I've already commented at some length so won't do it again now. In light of all that has been said (and I seriously don't mean just by me!) I'd like to put forward an amended or counter- proposal, but don't think I will do it in time for the call in 5 minutes.

My understanding is that the agenda for the upcoming call is the 1.0 documentation.

Regards Bob

On 14 February 2013 15:25, Matt Berg notifications@github.com wrote:

Great discussion.

If anyone, has any feedback on my last proposal that would really be great. I think it approaches both concerns. We do need to come to an agreement on this soon as we have teams trying to implement this now.

Thanks,

Matt

On Thu, Feb 14, 2013 at 6:23 PM, jason-p-pickering < notifications@github.com

wrote:

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of t he facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

— Reply to this email directly or view it on GitHub<

https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555044>.

— Reply to this email directly or view it on GitHub<

https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555242>.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13558365>.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13559249.

bobjolliffe commented 11 years ago

OK sorry its a long thread. I'll review and get back to you.

On 14 February 2013 16:03, Matt Berg notifications@github.com wrote:

Bob not to the last proposal I sent I believe.

On Thu, Feb 14, 2013 at 6:54 PM, bobjolliffe notifications@github.comwrote:

Matt I've already commented at some length so won't do it again now. In light of all that has been said (and I seriously don't mean just by me!) I'd like to put forward an amended or counter- proposal, but don't think I will do it in time for the call in 5 minutes.

My understanding is that the agenda for the upcoming call is the 1.0 documentation.

Regards Bob

On 14 February 2013 15:25, Matt Berg notifications@github.com wrote:

Great discussion.

If anyone, has any feedback on my last proposal that would really be great. I think it approaches both concerns. We do need to come to an agreement on this soon as we have teams trying to implement this now.

Thanks,

Matt

On Thu, Feb 14, 2013 at 6:23 PM, jason-p-pickering < notifications@github.com

wrote:

Thanks. It has been a good discussion which needs to be had here, but I am still not sure if we are really any further along to resolving the issue.

The real issue especially in terms of country ownership is allowing the organisations to make these decisions themselves. Raising big red flags that "If you do not do this, these will be the consequences..." should be enough to put them on the right track, but it may not be, and therefore, the API should cater to flexible situations in which this might be the case. The reason I am pushing this so much is that this was one of the conclusions we reached during the development of the WHO Master facility list paper, in consultations with different national Ministries of Health. There was essentially universal recognition of the fact that countries need a list of their facilities. But specifics on IDs should be left to countries to decide based on their own circumstances. Providing them sound technical guidance and platforms which support it, of course makes their decision a lot easier.

Just so that we are clear though, we should not think in anyway that the UUID alone is going to enable federation to actually work. It may make it technically easier, but it also raises other problems. Often times facility lists are made from existing lists. These may be something like an Excel sheet, which often does have a code, but not a UUID. Once this is imported into one system, you make have a name like "Abia State Teaching Hospital" with a code like "AB002121", which is the internal "legacy" code. If the system generates a UUID on one platform, well, it is not going to be the same UUID on another system, even though the name and the legacy code would be. So, in a federated system, you might have two separate downstream systems, which would register a facility with an another system, with the same name and same code, but two separate UUIDs. Now what? How can the conflict be resolved? This is again why in the WHO paper we introduce the concept of the "signature" of t he facility, one component of which is the ID. Obviously in this example (based on real life here in Nigeria) , the UUID simply does not help us at all. It may lessen the chance of a collision but cannot remove it of course.

Even in the chance of a collision is small and we can somehow verify that everyone is using RFC 4122, we still do not solve the problem at all of establishing a facility list without duplicates, solely by relying on a UUID. In my experience, this is a much bigger problem than potential collisions between some identifier. The real identifier is composed of the name of the facility, its location, and potentially other factors, such as the national facility code, which may be well known. So, I am just not sure if the UUIDs are really going to solve the problem, without some sort of hierarchy of in place, where each installation would need to contact a master to see if the facility already exists, and if it does, use the existing UUID, which sort of defeats the purpose of having federated IDs in the first place.

At the end of the day, I think the recommendation of a UUID is sound from a technical standpoint, and does solve some problems. Just not sure at all if offers any comparative advantage over DHIS2's UIDs or even a countries own coding scheme, and therefore should in no way be a mandatory component of the API (although it should be highly recommended ).

Best regards, Jason

— Reply to this email directly or view it on GitHub<

https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555044>.

— Reply to this email directly or view it on GitHub<

https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13555242>.

— Reply to this email directly or view it on GitHub< https://github.com/facilityregistry/fred-api/issues/33#issuecomment-13558365>.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/33#issuecomment-13559249.

edjez commented 11 years ago

See resolution to Issue #45