cfpb / grasshopper-parser

Address Parsing REST API
Creative Commons Zero v1.0 Universal
8 stars 8 forks source link

Full street name address endpoint #17

Closed jmarin closed 9 years ago

jmarin commented 9 years ago

Right now, the parser returns a JSON object with a variable number of address components. In order to better integrate with TIGER geocoding (main reason for creating the parser) it would be highly desirable to make the parser return a fixed set of address components, as follows:

hkeeler commented 9 years ago

@jmarin, just want to confirm this is what you're looking for here. Based on the usaddress address parts, we could do:

  1. addressNumber:
    1. Just AddressNumber?
    2. AddressNumberPrefix + AddressNumber + AddressNumberSuffix?
  2. state:

    1. StateName

      Note: usaddress provides no logic for converting to 2 digit state abbreviations. You get back whatever is put in, so if you put in "Calif" or "California", that's what you get back, not "CA". Do you want me to take on this type of conversion as part of this change.

  3. zipcode:

    1. ZipCode

      Note: usaddress provides no logic for enforcing 5 digit zips. For instance, usaddress considers 1234567890 to be a valid zip. Do you want me to take this on as part of this change? If yes, what's the behavior?

      1. Return a 5 digit zip
      2. Return the first 5 digits on a 9 digit zip, allowing both 123456789 and 12345-6789 formats?
      3. Fail with validation error if any other zip style present?
  4. streetName:
    1. Concatenate all StreetAddress* address parts in the order they were parsed?
    2. A more specific set of address parts concatenated together?
hkeeler commented 9 years ago

Also, do you want OccupancyType and OccupanceIdentifier and/or SubaddressType and SubaddressIdentifier included?

hkeeler commented 9 years ago

I have an initial working version:

Request:

GET /standardize?address=1234+north+main+st+apt+1b+sacramento+ca+95811

Response:

{
  "input": "1234 north main st apt 1b sacramento ca 95811", 
  "parts": {
    "addressNumber": "1234", 
    "city": "sacramento", 
    "state": "ca", 
    "streetName": "north main st", 
    "zip": "95818"
  }
}

Notice, "north main st" remains intact and "apt 1b" is dropped. This seems like what we'd want.

The resource is currently /standardize. I'm not in love with it. Trying to come up with a term that is not specific to TIGER geocoding, and possibly useful for other purposes.

Under the hood it always uses usaddress's tag function and performs validation, so incomplete addresses will fail with a 400 error.

Thoughts?

awolfe76 commented 9 years ago

@hkeeler I think you said that the 400 error would also tell you why, right? What fields are missing or invalid?

Will be useful for UI work to display errors.

jmarin commented 9 years ago

@hkeeler I think your initial implementation gets us what we need for now. I'm sure some stuff will come up, but we can revisit later. Can you do a pull request with this? Thanks

hkeeler commented 9 years ago

@jmarin, yes, once I get a few tests written, and fix a bug I just found, I'll put up a PR.

@awolfe76, I'll post what the validation error looks. It is not currently very machine readable. It is just a string describing which fields are missing...but better than nothing for now. Additionally, since the UI won't be calling the parser directly, the geocoder API will have to either forward on these errors back in its response, or come up with its own response. Sometime soon, we'll need to discuss how we want child service errors to bubble up through the parent geocoder service.

hkeeler commented 9 years ago

@awolfe76, here's what the validation error currently looks like:

Request:

GET http://localhost:5000/standardize?address=north+main+st+apt+1b+sacramento+ca

Response:

{
  "error": "Parsed address does not include required address part(s): ['ZipCode', 'AddressNumber']", 
  "statusCode": 400
}
awolfe76 commented 9 years ago

Thanks, like you say, better than nothing for now just good to know it will be there.

Here's a quick use case: Someone is manually entering an address into an input. Using onblur it would be nice to make a call to make sure that it's a valid address (one that can be geocoded). That call doesn't require a full grasshopper response, just a true/false with errors if they exist. Not sure if that justifies a call directly to the parser alone or another way to handle it. Just wanted to point out that case that's come up already.

hkeeler commented 9 years ago

Yep, makes sense. I think we'll have many use cases for just address string parsing. Just not sure if all requests will go through the geocoder API (proxying to the parser API) or if we'll call the parser directly. I don't yet have a strong opinion either way.

awolfe76 commented 9 years ago

:+1: