GreenBuildingRegistry / usaddress-scourgify

Clean US addresses following USPS pub 28 and RESO guidelines
MIT License
206 stars 47 forks source link

Valid occupancy types replaced by 'UNIT' #11

Closed bmckalla closed 4 years ago

bmckalla commented 4 years ago

HI there, I appreciate your work on normalizing the US Address library!

I notice the following issue:

>>> from scourgify import normalize_address_record
>>> normalize_address_record('12345 Somewhere Street Apt 1, Town, MA 12345')
{'address_line_1': '12345 SOMEWHERE ST', 'address_line_2': 'UNIT 1', 'city': 'TOWN', 'state': 'MA', 'postal_code': '12345'}

I believe it has to do with the following bit of code in scourgify.normalize.normalize_occupancy_type:

    default = default if default is not None else 'UNIT'
    occupancy_type_label = 'OccupancyType'
    occupancy_type = parsed_addr.pop(occupancy_type_label, None)
    occupancy_type_abbr = OCCUPANCY_TYPE_ABBREVIATIONS.get(occupancy_type)
    occupancy_id = parsed_addr.get('OccupancyIdentifier')
    if ((occupancy_id and not occupancy_id.startswith('#'))
            and not occupancy_type_abbr):
        occupancy_type_abbr = default

    ...

When I step debug, the returned occupancy_type is 'APT'. However, the occupancy_type_abbr is set as None considering the OCCUPANCY_TYPE_ABBREVIATIONS are currently in the format <full_name> -> <abbreviation>

I suggest the following fix:

    if occupancy_type in OCCUPANCY_TYPE_ABBREVIATIONS.values():
        occupancy_type_abbr = occupancy_type
    else:
        occupancy_type_abbr = OCCUPANCY_TYPE_ABBREVIATIONS.get(occupancy_type)

Then you can cover both cases.

bmckalla commented 4 years ago

@fablet I went ahead and created a PR for the fix, hopefully it's sufficient!