USPTO / PatentPublicData

Utility tools to help download and parse patent data made available to the public
Other
182 stars 80 forks source link

City Address Normalization (city suffix) #80

Open bgfeldm opened 5 years ago

bgfeldm commented 5 years ago

Within the Public data entities address (inventor,applicant,assignee,agent) only contains City, State and Country. But currently, as of 2019, only Country is reliably written the same and the State/Provinces/Prefecture are often omitted in foreign addresses. These limitations make entity resolution and searching for a specific entity more difficult on foreign entities.

-- Often foreign entities the State/Provinces/Prefecture follow the city within the City field and State field is empty.

if CountryCode is "JP" : remove trailing "-shi" for city if CountryCode is "KR" : remove trailing "-si" for city

Japan Prefecture can end in -to, -ken, -fu Japan has other suffixes: town (-machi), county (-gun) and city district (-ku)

For now, keep it safe and simple and only remove city suffixes per country code.