HUD-Data-Lab / Data.Exchange.and.Interoperability

Repository for Homeless Management Information System (HMIS) development and management of products to support data exchange and interoperability
GNU General Public License v3.0
2 stars 5 forks source link

API does not support bulk synchronizing of data #20

Open TomNUSDS opened 1 month ago

TomNUSDS commented 1 month ago

Aka the current API approach doesn't support bi-directional bulk data synchronize. It is more client-server with the server being the "source of truth".

Issues:

  1. Currently, external callers can only add/update one record at a time. This will likely fail at scale.

  2. Results for bulk insert/udpate must return bulk results.
    So an array of success/fails in the same order as what was sent. The HTTP result code is insufficient since it speaks to the overall request, not individual records inside the request.

  3. Only the fields that are known to be modified need be sent for an update.
    E.g. An ElementName might be required for an insert, but for an update, if it wasn't changed, then can be omitted. This means that for inserts, the required fields are different from required fields for an update. Likely, the only required fields for an update is the primary id and a revision number (see below), plus the the subset of changed fields.

  4. Updates might require a per-record revisioning mechanism for merge conflict checks.
    This might be done using a LastMod timestamp or a specific new field with an incremented version number. The fundamental idea is that updates include a revision number (or the prior LastMod). If the data in the DB has a newer revision number that what is coming from the update, then the operation fails because someone else already modified it.

  5. Probably should clearly define what "UPSERT" means.
    Does it mean a bulk operation that includes a mix of inserts/updates/deletes? Or, the database concept of upserts? DB upserts often means NOT knowing the record ID when sending the data? (DB Unique Keys concept where a combination of fields make a row unique might make sense? But it would need to be defined for each table and is a probably a hard problem.)

TomNUSDS commented 1 month ago

I think it's OK to say the first iteration of the API is intended for client-to-server communications where the HMIS server is "source of truth" for the data. This means that it's responsible for generating new record ids for create operations, and for generating timestamps for fields like DateCreated, DateModified and DateDeleted.

Maybe we can add a "Revisit in the future" or "Future work" github label?