facilityregistry / fred-api

Facility Registry API Documentation Website
11 stars 4 forks source link

Clarification : Post Facility #7

Closed ghost closed 11 years ago

ghost commented 11 years ago

Another clarification for the formal documentation.

What are the thoughts around the POST (or creation) of a duplicate facility record. I am trying to clarify in my mind the expected behavior of a facility registry implementation.

In a perfect world, the facility registry should be capable of acting like IHE PIX managers do. That means whenever a duplicate record is detected (for example, FR already has a facility with the same name, same geo-location, and same MOH id) it should merge the records. The rationale for this is simple; places the onus of detecting duplicate entries with the registry which has knowledge of ALL facilities registered in a jurisidiction (or at least it is a more central authority) and centralizes the logic for matching / detecting duplicates. This is helpful as legacy systems "come online" and need to migrate their data in batch.

From what I've read this is not the case. Based on my interpretation of the verbiage on the FR API docs, the FR is to return a 409 if a duplicate exists (although this is not really clear). Is this true or does the FR simply create a new record? IMO placing the onus on the client to determine if a facility has already been registered prior to registering the facility can be problematic, and may lead to inconsistent implementations of duplicate resolution.

I am ok with either solution, I just need clarification so I can clearly write the specification document in an unambiguous way.

mberg commented 11 years ago

The duplication check will be difficult especially if the registry is used to assign new government ids upon creation.

If the MOH assigns an ID to the facility in advance and that's passed in the initial facility registration we could detect on that.

We could go with a geospatial approach for duplication checking. We can assign a certain radius threshold where we think a duplication can occur. I think a lot of this will need to be done manually.

In reality, I think we'll probably need to allow the system to allow for creating new facilities but then provide the means of later deleting / merging duplications.

In summary, if a facility ID is not passed it will hard to programmatically check for duplicates that will satisfy all user requirements. This will need to probably be managed manually.

How does the resource mapper and DHIS2 currently support this.

ghost commented 11 years ago

Hi Matt,

I'm ok with just stating the registry should return a 409 if it detects a duplicate (using whatever algorithm implementers deem necessary), I just need to be unambiguous in this behavior in the formal spec so that clients know what it means when they receive this error code (or what to expect if they register a duplicate) :)

rowenaluk commented 11 years ago

+1 on returning 409.

in practice, the vast majority of our target users will not have good geospatial data, will not have consistent naming schemes, and may be using the initial setup of FR to assign MOH IDs (as Matt describes above). at this point, we can't realistically expect the FR to resolve the duplicates, so let's return the 409 and solve the issue manually for now.

bobjolliffe commented 11 years ago

DHIS2 currently requires a uniqueness on the name. Whereas there is a whole bundle of information science which points to this being a bad idea, in practice, as Rowena points out, the geo-spatial data is often not there or inaccurate, and this is frequently the only disambiguating field we have.

We have played with the idea of making this more nuanced, so for example it would be ok to have 2 "St Mary's Clinic" as long as they are not in the same district, sector or what have you. This is probably the best compromise. Currently we simply append the name of the layer above eg. "St Mary's Clinic [District Six]" when we need to disambiguate.

The general point Rowena makes is valid .. that registries will use different strategies for disambiguation. I am fine with just returning a

  1. In practice the creation of new facilities is relatively infrequent and would probably be done directly through the primary application UI by the application owner rather than from an external client (I know the distinction is moot on resource mapper).

On 26 November 2012 01:42, rowenaluk notifications@github.com wrote:

+1 on returning 409.

in practice, the vast majority of our target users will not have good geospatial data, will not have consistent naming schemes, and may be using the initial setup of FR to assign MOH IDs (as Matt describes above). at this point, we can't realistically expect the FR to resolve the duplicates, so let's return the 409 and solve the issue manually for now.

— Reply to this email directly or view it on GitHubhttps://github.com/facilityregistry/fred-api/issues/7#issuecomment-10701979.

mberg commented 11 years ago

Agreement on 409 when there is a duplicate. We will leave it the implementation to decide how to detect duplication.

edjez commented 11 years ago

Agree. Creating new gov IDs and other meaningful data generation rarely happens on an http post; but typically gets done by a human as part of a blessing/approval/assignment process later on