CartoDB / data-services

CARTO internal geocoder PostgreSQL extension
25 stars 11 forks source link

Improper polygon artifact for Toronto postal code? #215

Closed ztephm closed 8 years ago

ztephm commented 8 years ago

@iriberri

S/B 9059399

Toronto postal code is returning two polygons, one looks improper.

Context

If you georef the postal code for polygons, it looks like there's another tiny area for 'M2N' in addition to the larger main polygon. I've tried looking up maps for M2N but none are showing 2 different areas for it, altho the maps might not be that detailed. Do we have another way to verify if this tiny area is an artifact or really should exist?

Process

  1. Create new empty dataset w/two rows and a postal code string column
  2. Fill in the postal code column with 'M2N' in one row and 'M6P' in the other.
  3. Edit > Georeference > Postal Code > use postal code column and enter Canada as country
  4. Georeference your data with Administrative Regions

    Current result

https://team.cartodb.com/u/stephaniemongon/tables/canada_postal_code_test/map

screen shot 2016-05-31 at 12 29 55 pm

screen shot 2016-05-31 at 12 30 18 pm

iriberri commented 8 years ago

According to Google it doesn't seem to be a "2 polygons" zipcode. The small red region belongs to a very specific building block, so looks like a bug in the source data.

I've taken a look at out Canadian zipcodes source and the data seems to be updated daily (it says last updated today). I'll check if the current data still contains that error, you can check it: http://www5.statcan.gc.ca/access_acces/alternative_alternatif.action?l=eng&dispext=zip&teng=gfsa000a11a_e.zip&k=%20%20%20%2026170&loc=http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/gfsa000a11a_e.zip

ztephm commented 8 years ago

Thanks @iriberri ! I just checked and see the same error from the source zip, I can check again tomorrow to see if the update is the same as today's.

iriberri commented 8 years ago

I sent them a support email mentioning the issue. I'll keep you updated if I receive any response.

iriberri commented 8 years ago

Hey @ztephm! I finally got an answer:

Census FSA boundary file is usually released every census or once every 5 years. Therefore, it is not updated very often.

Out of 1,621 FSAs from 2011 Census FSA boundary file, 581 have more than one polygon. The main reason is that it is how FSAs were designed by Canada Post. They were designed for facilitating mail delivery, not for anything else. The second reason is that the census FSA was based on postal code responses on census questionnaires returned by responders. The second reason explains that the census FSA boundary file from Statistics Canada web site may not be exactly identical with what Canada Post designed.

They reference this PDF where more info can be found: 92-179-g2011001-eng (1).pdf

Given this information, I think there's no much we could do, because we could generate a fix for this explicit area but after being told that there are more than 500 in the same situation, I don't think it is worth it. What do you think?

Good news is that currently we have 2011 Census uploaded and "soon" we could start using 2016 data -- whenever it is released.

ztephm commented 8 years ago

Thanks @iriberri ! I think not worth it because based on the explanation they might not even be inaccurate technically. I will let client know.