datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

ERROR: Unable to tag this string because more than one area of the string has the same label #180

Open rsingh2083 opened 7 years ago

rsingh2083 commented 7 years ago

While tagging this

usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

Im getting this error : -

---------------------------------------------------------------------------
RepeatedLabelError                        Traceback (most recent call last)
<ipython-input-41-410055ac0cac> in <module>()
----> 1 usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

C:\Users\Rahul\Anaconda2\lib\site-packages\usaddress\__init__.pyc in tag(address_string, tag_mapping)
    176         else:
    177             raise RepeatedLabelError(address_string, parse(address_string),
--> 178                                      label)
    179 
    180         last_label = label

RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States
PARSED TOKENS:    [(u'Mr.', 'Recipient'), (u'Robbie', 'Recipient'), (u'Thomson,', 'Recipient'), (u'Cal.', 'Recipient'), (u'Hosp', 'Recipient'), (u'2,', 'AddressNumber'), (u'Street', 'StreetNamePreType'), (u'11,', 'StreetName'), (u'Block', 'Recipient'), (u'H,', 'Recipient'), (u'Jersey,', 'Recipient'), (u'New', 'Recipient'), (u'Jersey', 'Recipient'), (u'121889,', 'AddressNumber'), (u'United', 'StreetName'), (u'States', 'StreetName')]
UNCERTAIN LABEL:  Recipient

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
jeancochrane commented 7 years ago

Hey @rsingh2083,

Thanks for filing this! That's a real doozy of an address. I haven't been able to figure out what it's referring to.

If you can confirm that this is a valid address pattern, we'd be happy to bring it in as training data. We'll need 4-5 more examples of the pattern to be able to train the model reliably.

gl-ronak commented 7 years ago

I get similar error for this address : 9234 N Loop 1604 W San Antonio TX 78249

jeancochrane commented 7 years ago

Hey @gl-ronak,

Can you tell me how you were expecting that address to be parsed? In particular, what does the second set of numerics (1604) refer to?

If you can find 3-4 more examples of this pattern, we'd be glad to bring it in as training data.

NoahCardoza commented 4 years ago

I just experienced a similar issue.

usaddress.RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  1407 7 Ave NW, Calgary, AB T2N 0Z3, Canada
PARSED TOKENS:    [('1407', 'AddressNumber'), ('7', 'StreetName'), ('Ave', 'StreetNamePostType'), ('NW,', 'StreetNamePostDirectional'), ('Calgary,', 'PlaceName'), ('AB', 'StateName'), ('T2N', 'OccupancyIdentifier'), ('0Z3,', 'OccupancyIdentifier'), ('Canada', 'PlaceName')]
UNCERTAIN LABEL:  PlaceName

I'm using this library to automate the parsing of data from Google Maps to input into a SF db of organizations we work with. I'm I think I see where the error occurred , Calgary,, however it is a Canadian address so that could be normal?

jeancochrane commented 4 years ago

@NoahCardoza I think in this case there are actually two things going on:

  1. The Canadian postal code format is pretty different from zip codes so the postal code is getting tagged as OccupancyIdentifier, which is probably throwing off the tagging of Canada and causing it to get tagged as a repeated PlaceName
  2. We don't support non-US countries, so there's no real valid tag for the Canada string anyway

I was able to get a slightly more sensible parse by removing Canada from the end of the string:

>>> usaddress.tag('1407 7 Ave NW, Calgary, AB T2N 0Z3')
(OrderedDict([('AddressNumber', '1407'), ('StreetName', '7'), ('StreetNamePostType', 'Ave'), ('StreetNamePostDirectional', 'NW'), ('PlaceName', 'Calgary'), ('StateName', 'AB T2N'), ('ZipCode', '0Z3')]), 'Street Address')
NoahCardoza commented 4 years ago

Ah, that should probably be enough. We don't have many organizations in CA, however, what are your thoughts on https://github.com/datamade/usaddress/pull/254? I'm assuming you might not be merging it seeing as the name of this project is usaddress?

jeancochrane commented 4 years ago

I don't expect we'll support Canadian addresses in the near future, but if you'd like to support them you might try training your own model using the supplemental training data in #254.

LeandroLosaria commented 2 years ago

Hello,

I just encountered an error

ORIGINAL STRING: Bronx, New York City PARSED TOKENS: [('Bronx,', 'PlaceName'), ('New', 'StateName'), ('York', 'PlaceName'), ('City', 'PlaceName')] UNCERTAIN LABEL: PlaceName


Seems to be a valid place https://www.britannica.com/place/Bronx-borough-New-York-City