Open TomNUSDS opened 4 months ago
Another idea:
v1s1.
(version 1 sha 1) This would be useful for forwards compatibility if there are changes to the algorithm. And backwards compatibility by being able to identify which ids do not use this approach. One interesting issue with this approach. If the user doesn't supply full information that is correct, the generated ID will be wrong. (e.g. if they don't recall their SSNum).
Will changing the LastName or fixing the SocialSecurityNumber change the PersonID
?
Maybe the PersonID
should be a UUID (random) and a new field named HashID
should be used for bi-directional syncing (or finding matching records across systems). If any of the primary fields are updated, then a new HashID is generated.
Further thoughts on this is that the above approach can be used to generate a NEW field called HashID
v1.
to stringBenefits:
(NOTE: if the HashID can be longer than Str32, then investigate more security robust systems like HMAC-SHA256 https://en.wikipedia.org/wiki/HMAC)
One potential issue is if the SSN being empty (because a person refused or didn't know it). If this is common, then this approach probably could fail frequently.
Synthetic SSN could possibly fill help? Basically, generate a random SSN but keeping the two center numbers -00-
(which is disallowed by SSN rules).
Also useful if there's an HudUUID value added to records that take the place of PersonIDs (which could be used by databases as the primary key, but may also be auto-incremented and thus overlap across different CoCs).
ClientIDs in the CSV specification are just string32.
Something like
SHA1(SocialSecurityNumber + Full Last Name + Initial of First Name + Date of Birth)
where all these fields are normalized.Normalization:
é
toe
)Example with invented name containing accents and dashes:
Using CyberChef
A SHA1 is 128bits, which can be encoded into 27 characters using Base62 (AlphaNumeric)
Pros:
Issues: