cfpb / grasshopper

CFPB's streaming batch geocoder
Creative Commons Zero v1.0 Universal
37 stars 13 forks source link

Fix CensusGeocoder address number range queries #212

Open hkeeler opened 8 years ago

hkeeler commented 8 years ago

As mentioned on https://github.com/cfpb/grasshopper/pull/211#issue-143093233, I commented out houseQuery from CensusGeocoder.searchAddress() (see: https://github.com/cfpb/grasshopper/pull/211/commits/6dd5d5834794c5fdbaad67436a8e498084276050#diff-6b3a25548e48dfe8c1891ebf6afef9efR87) in order to get it working post ES 2.2 upgrade.

While working through the upgrade, I discovered 2 other bugs related to houseQuery.

  1. We're using string datatype for address number range fields (LFROMHN, RFROMHN, LTOHN and RTOHN). Unfortunately, this does not perform as expected on ES Range Query since string ranges are calculated lexicographically, not numerically. The result is "1000" < "200" < "30".

    The simple fix to just change grasshopper-loader's censusType. However, this is complicated by the fact that not all address numbers are just numeric. Some address ranges have letters and other characters like - as well.

    There are probably a few approaches for fixing this, but they'll most likely involve changes to both the loader and CensusGeocoder.

  2. We're assuming that the FROM values will always be less than the TO values . Unfortunately, that is not the case as seen in the example below.
            {
               "type": "Feature",
               "properties": {
                  "TLID": 84836894,
                  "TFIDL": 214102038,
                  "TFIDR": 214101073,
                  "ARIDL": "4003966015873",
                  "ARIDR": null,
                  "LINEARID": "1103732644476",
                  "FULLNAME": "Rockefeller",
                  "LFROMHN": "398",
                  "LTOHN": "320",
                  "RFROMHN": null,
                  "RTOHN": null,
                  "ZIPL": "71832",
                  "ZIPR": null,
                  "EDGE_MTFCC": "S1400",
                  "ROAD_MTFCC": "S1400",
                  "PARITYL": "E",
                  "PARITYR": null,
                  "PLUS4L": null,
                  "PLUS4R": null,
                  "LFROMTYP": null,
                  "LTOTYP": null,
                  "RFROMTYP": null,
                  "RTOTYP": null,
                  "OFFSETL": "N",
                  "OFFSETR": "N",
                  "STATE": "AR"
               },
               "geometry": {
                  "type": "LineString",
                  "coordinates": [
                     [
                        -94.343073,
                        34.033726
                     ],
                     [
                        -94.34307,
                        34.034189
                     ]
                  ]
               }
            }