codeforamerica / ohana-api

The open source API directory of community social services.
http://ohana-api-demo.herokuapp.com/api
BSD 3-Clause "New" or "Revised" License
185 stars 344 forks source link

Consider use of UUIDs to allow importing multiple independent sets of HSDS CSV files #341

Open pmackay opened 9 years ago

pmackay commented 9 years ago

How are the ID columns used when importing CSVs? Are they just for cross referencing within the CSV content? Presumably on import Rails allocates IDs to the entries?

Has the use of a UUID field been considered for replication of content, so an item (org or location) could be uniquely identified? I notice OR has a "Resource ID" field, might that be a future feature on Ohana?

monfresh commented 9 years ago

The import script uses the ID in the CSV file to set the ID in the Rails DB, so the ID in the CSV file acts as a unique identifier.

pmackay commented 9 years ago

Would the Resource ID field in the OR spec be implemented by adding a separate UUID field to each applicable model?

And how does current import process handle entries with the same ID as one in the database? Presumably it cannot be a UUID because it has to be an integer/key column.

monfresh commented 9 years ago

Why is a Resource ID field needed? The id field already serves the purpose of a unique identifier.

Is there a specific issue with the import process you are running into? If so, please state the bug and the steps to reproduce.

pmackay commented 9 years ago

It comes back to importing records from other systems. Its not possible to define IDs just as simple database primary keys that guarantee uniqueness. What if I have an Ohana db with 100 orgs, and then try to import another org for a different system that has an ID that collides with one in Ohana?

monfresh commented 9 years ago

Thanks! That's more helpful. Starting with the actual issue is always a good idea :smile: I hadn't thought about this scenario, and in this case, yes, a uuid field would make sense.

md5 commented 9 years ago

Having a uuid or resource_id field on each table would make it possible to implement the metadata table in HSDS as well, although the spec seems to say that the id field should contain the UUID.

md5 commented 9 years ago

I supposed it's possible to do so now since the id fields are all the same type.

cderenburger commented 9 years ago

Could this potentially allow importing (and potentially resetting/dropping) from each separate source db as independent tasks? I'm currently attempting to import our whole state db and am having issues completing an import. Breaking this into smaller tasks might help with larger data sets as well.

monfresh commented 9 years ago

This particular issue only deals with adding a new field to the DB.

If you're having problems importing a large data set, please open a new issue. Thanks!