datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Uncertain label in true addresses - multiple demos #362

Open noamychamovitz opened 8 months ago

noamychamovitz commented 8 months ago

Occupancy Type -

  1. 2100 Standiford Ave Ste 17-18E Suite E17-18, Modesto, CA 95350
  2. 3501 McHenry Ave Ste A8 Ste A8, Modesto, CA 95356-1575
  3. 2225 Plaza Pkwy Ste C4 Suite C-4, Modesto, CA 95350
  4. 1419 Standiford Avenue Suite 1 Ste 1, Modesto, CA 95350
  5. 3430 Tully Rd Ste 65 Ste 65, Modesto, CA 95350
  6. 3250 Dale Rd Ste R Suite R, Modesto, CA 95356
  7. 3025 McHenry Ave Ste H Ste H, NorthGate Village, Modesto, CA 95350-1465

    I got a label repeat error due to Ste and Suite combination.

PlaceName - When the address has a repetition of state (bug, but it happens), i get a placename error -

  1. 1016 H Street, Modesto, CA 95354, Modesto, CA 95354
  2. 1320 Standiford Ave, Modesto, CA, Modesto, CA 95350-0726
  3. 4500 Dale Rd Suite D Modesto , CA 95356, Modesto, CA

Where there are 2 tags that are identical i would recommend ignoring the 2nd appearance and not throw an error

In addition, there are cases when a PlcaeName is tagged when it is not true, especially happens with malls etc.

  1. 3401 Dale Rd Vintage Faire Mall 244, Modesto, CA 95356-0505

StreetName - Duplication due to more text explanation (Corner of 2 streets)

  1. 1340 Coffee Rd Orangeburg Ave & Coffee Rd, Modesto, CA 95355-3103

Address Number -

Having a lot in a plaze makes the tag think there are 2 street addresses -

  1. 3900 Pelendale Ave Suite 50/ Save Mart Plaza, Modesto, CA 95356