codeforamerica / ohana-api

The open source API directory of community social services.
http://ohana-api-demo.herokuapp.com/api
BSD 3-Clause "New" or "Revised" License
185 stars 344 forks source link

Relax validation constraints, allow adding incomplete data for correction later #174

Closed francisli closed 10 years ago

francisli commented 10 years ago

I thought I'd bring up a suggestion for discussion, as I've started participating in the SF Brigade's project with OpenReferral/Ohana. While the spec and the API platform are being worked on, we've been spending most of the time gathering data and preparing them for import. The process is basically gather, transform, attempt import- see failures, rollback additions, edit data, re-attempt import- see failures, rollback, edit, - wash, rinse, repeat. Managing this process in the source import files themselves is a very technical and rather arduous task (especially when the data set is large).

A more user-focused process would be to allow importing and storing incomplete data through the admin interface, with views for highlighting invalid records for editing/correction at a later time. For example, right now to get data to import, we're often putting empty placeholder strings in various required fields. Now those records are "valid", but with placeholder values that are not always easily distinguishable from real "valid" data. Contrast that to being able to import them with clear flags that they are in an invalid/draft state, so that they can be easily recalled for reference when confirming/editing/correcting the data- say, by calling the organization, which could be days later.

This is more than just a UI/design issue- for this to be possible, the data model will need to support it- i.e. by removing database constraints (i.e. NOT NULL) and adding conditionals to the validations. Note that this is not completely removing the validations- validations can still be run conditionally, and validation state could be stored as a boolean column in each table. Clearly if a record is incomplete/invalid as per spec, it should not be fetched from the API nor exported.

monfresh commented 10 years ago

Thanks for the feedback, Francis! First, I'd like to make sure I understand the issue. It sounds like there are two issues here:

The ability to flag invalid entries was added 18 days ago. Currently, the import script allows all valid entries to go through, and then saves the invalid entries to a separate file. This allows you to have a working version of the API that serves valid data, while also allowing you to correct the invalid data at a later time, albeit via the data files, not the admin interface.

As for the validation rules, that is something we are still iterating on via the OpenReferral spec. If there are specific validations you think are too strict, please mention them here and also open an issue in the OpenReferral repo. One validation that I think you ran into was the address/mail_address requirement for a location because you have many "virtual" locations. One way to get around that is to add a virtual boolean column to the Locations table (set to false by default), then change the validation rule to only apply if the location's virtual field is false, then set the virtual field to true in your dataset for all locations that don't have an address.

Feel free to submit a pull request if you think that's the right approach for virtual locations.

francisli commented 10 years ago

Yes- I've seen the output from the import rake task that collects the errors. However, as you note, this then requires you to edit the raw data files and re-import.

The suggestion is to make it possible to store incomplete/invalid data in the database so that admin/user facing tools and workflows can be designed for editing and correcting them rather than having to edit the raw data files for re-import.

As for what constitutes valid data, yes, I'll bring that up in the OpenReferral project as I encounter issues and have ideas...

monfresh commented 10 years ago

Understood. That's a reasonable suggestion, but it's not something I have the time to build at the moment. I'm the only one currently working on the API and Admin Interface, and I have my plate full with refactoring tasks. If the SF Brigade has the capacity to build this, I'll gladly review a pull request.

monfresh commented 10 years ago

Once the OpenReferral data standard is finalized, the API will be updated to match the data standard rules. The API will only accept data that conforms to the OpenReferral standard. Therefore, anyone wishing to import data into Ohana API will need to have valid data to begin with.

For those wishing to validate their data using a GUI, I think that would need to happen outside the API. I'm envisioning a standalone tool that allows you to import CSV files for example, and lets you know whether or not they conform to the OpenReferral standard, similar to the GTFS validator, but then also allows you to edit the files. There's already an issue for this in the OpenReferral repo.

Since the API is meant to respect the OpenReferral standard, I don't think it makes sense to allow bad data through, especially since it would mean removing DB constraints. I'm therefore closing this issue.