OpenAddressesUK / sorting_office

A Sinatra app that takes an address string and breaks it into its constituent parts
8 stars 5 forks source link

Internal server errors on incomplete addresses #4

Open giacecco opened 9 years ago

giacecco commented 9 years ago

Basic requests like the one below - where the postcode is missing - return an internal server error (oddly formatted as HTML, too):

.../~$ curl --data "address=22 greenway berkhamsted" https://sorting-office.openaddressesuk.org/address
<h1>Internal Server Error</h1>.../~$ 

An incomplete address should be recognised as such, possibly with a null postcode result. What do you reckon?

pezholio commented 9 years ago

This is now done. A request without a postcode now returns a 400 header and an error message.

giacecco commented 9 years ago

@pezholio sorry but I don't think that is correct. An address without a postcode is still worth parsing. We'll have plenty coming from one of the partners we may be working soon. I believe we should manage missing postcodes in the same way we manage missing towns etc.

pezholio commented 9 years ago

Parsing anything without a postcode is going to be difficult as the parsing all cascades down from there. For example, when we match a town, we check if it's in the right postcode area first, otherwise we can't be sure where we're getting the town from. Similarly, with localities we check the locality match is within reasonable bounds before accepting it, same with streets too.

giacecco commented 9 years ago

Worth having a chat with @murraydata about this? there must be a sensible way of parsing postcode-less addresses

pezholio commented 9 years ago

Possibly, but it will add a quite heavy layer of complexity I'd imagine. The logic is similar to the way he worked through addresses in the Companies House ETL (albeit without the ElasticSearch layer)

giacecco commented 9 years ago

Please proceed, at least to assess the complexity. We will decide together if it is worth proceeding to implementation and well.

pezholio commented 9 years ago

I think this should be considered an addition, as the original brief was:

"Re-write as a re-usable software component the algorithm that is currently part of the Corporates House ETL that interprets free-text addresses."

giacecco commented 9 years ago

Fine, but can you make the chat with John within the original points?

pezholio commented 9 years ago

Yeah, no probs :+1: