CartoDB / data-services

CARTO internal geocoder PostgreSQL extension
25 stars 11 forks source link

GBR postcodes #219

Open ernesmb opened 8 years ago

ernesmb commented 8 years ago

Postcodes for GBR are usually of the form:

CF5 1JY

where the first part references a certain area and the second part a smaller area within.

People are usually working with this type of postcodes, but they don't match the records in our geocoder, as it only works with the first part of the postcode.

A transformation like this is needed to get only the first part:

SELECT 
  *,
  trim(trailing ' ' from (
    substring(postcode from 1 for 4)
    )
  ) AS postcode_fixed
FROM table

However, that causes a lesser quality result, as we are lacking the second level of precision. The database at Geonames does recognise the postalcodes with two parts (check it here).

Would it be possible to integrate that info in our geocoding database?

this other issue may be related: https://github.com/CartoDB/data-services/issues/162

cc/ @rafatower @iriberri

rafatower commented 8 years ago

IMO this kind of data quality concern and free geocodes should go into the Observatory from now on, where's all the machinery for data ingestion and so on.

What's your view on this, @talos ?

talos commented 8 years ago

I'm up for integrating some of the data processing into the Observatory pipes. My questions would be:

  1. How would the geocoder downstream connect up with Observatory data? Would the existing dump/deploy process be sufficient?
  2. How often would the postal data need to be updated?
rafatower commented 8 years ago

How would the geocoder downstream connect up with Observatory data? Would the existing dump/deploy process be sufficient?

I guess we'd just replace the implementation of the current cdb_geocode_postalcode* client functions to connect to the observatory, just like any other observatory function exposed in client DB's.

Oh, BTW @ernesmb I guess you're talking about the builder cause the old editor table geocoding facilities are to be deprecated sooner or later, in favor of analysis relying on the API's (same functionality, different implementations, more flexibility through stable API's and all the power from analyses).

How often would the postal data need to be updated?

Not very often. Likely no more than twice a year.

talos commented 8 years ago

Great. I fully support this then. I see two (or three) tasks:

  1. Write ETL tasks in http://github.com/cartodb/bigmetadata to pull and parse the data.
  2. Update functions in http://github.com/cartodb/dataservices-api to provide the data to users from the Observatory.
  3. (Do we need this?) Write functions in http://github.com/cartodb/observatory-extension to obtain the data. Not sure if we would want to keep the existing model of a thin wrapper in (2), or have all processing happen in (2).
michellechandra commented 7 years ago

Is there any update on this issue @talos? A request has come in from an Enterprise client regarding supporting full UK postal codes.

S/B: 10516455

gingemonster commented 7 years ago

If it helps at all, the data is officially and openly available from https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html published quarterly. It comes with OSGB1936 easting and northing co-ordinates

talos commented 7 years ago

@michellechandra Not at the moment. This would be a pretty concerted effort to implement. Is it absolutely necessary for them to achieve this through the UI, rather than say having them add it as a table and achieving the same effect through a join by column (postal code to postal code)?

gingemonster commented 7 years ago

It's more than the build in search function in map visualizations can't really be used with clients in the UK as full postcode lookup would be expected. We could do as you suggest but that defeats the "out of the box" usefulness of carto and we may as well use someone else's geocoder. All the major commercial and open source geocoders support full uk postcodes

ztephm commented 5 years ago

This is coming up again in SB 20230660 cc @cmpera @danicarrion