HUD-Data-Lab / Data.Exchange.and.Interoperability

Repository for Homeless Management Information System (HMIS) development and management of products to support data exchange and interoperability
GNU General Public License v3.0
2 stars 6 forks source link

ID fields defined as just a String32 is insufficient #19

Open TomNUSDS opened 3 months ago

TomNUSDS commented 3 months ago

Problem: This specification (and the CSV spec) simply define IDs as "up to 32 character strings".

This is problematic because:

  1. ID collisions across systems if an auto-incrementing database ID are used. "Collisions" here mean two rows could end up with the same ID if data is pushed from two different systems.

  2. Some systems may decided to put PII into the IDs (like "LastName SS#") since there's no guidance NOT to do this. This will likely leak PII into logs since API requests put IDs into the URLs.

Proposal:

  1. Define the ID to be a UUID. Without the - characters, it is exactly 32 characters long (128 bits).

  2. Possibly add an opaque "ExternalID" field to all individual models that have an ID. This is like a cookie and can be used by syncing to re-associate data.

Issues with this proposal:

NOTES

TomNUSDS commented 3 months ago

Closing #26 is a more detailed feature request.

TomNUSDS commented 3 months ago

Oh, but it might apply to OTHER ID fields, so going to reopen and rework the title/description.