camaraproject / KnowYourCustomer

Repository to describe, develop, document and test the KnowYourCustomer API family
Apache License 2.0
7 stars 5 forks source link

Use houseNumber in stead of streetNumber #71

Open HuubAppelboom opened 2 months ago

HuubAppelboom commented 2 months ago

Problem description In the current release streetNumber is being used as one of the attributes, and at the same time houseNumberExtension is being used.

The use of streetNumber may lead to confusion in countries (like the USA) where often streets are indeed numbered.

Expected behavior We suggest to change the attribute streetNumber to houseNumber in an upcoming release, to avoid confusion

HuubAppelboom commented 2 months ago

In the Mobile Connect specification for KYC Match, they also use house number (or house name)

See https://www.gsma.com/identity/wp-content/uploads/2023/01/IDY.23-v1.0.pdf Page 14, table 4

GillesInnov35 commented 2 months ago

@all, for information this is what proposes TMF for GeographicAddress to describe an address Quite a complex design but it covers each use case. We could pick up some attributes in example.

image

HuubAppelboom commented 2 months ago

@GillesInnov35 In case we decide to follow TMForum, we would need as a minimum

The following data is for us optionally available (but not really needed for an address in NL)

Regards Huub

ToshiWakayama-KDDI commented 1 month ago

Hi All,

Sorry for the delay. I have chacked with one of my colleague working on OIDF how OIDF KYC to handle this. They have an attribute street_address which includes all the information / string minuter than City. They do not use either houseNumber or streetNumber, or houseNumberExtension.

Regards, Toshi

HuubAppelboom commented 1 month ago

@ToshiWakayama-KDDI For the Netherlands that would not work, or would cause lower match rates. We sometimes have numbers included in streetnames, and sometimes the housenumber extension includes numbers as well, which makes it difficult to distinguish in a matching process. That's why most parties here ask these items in separate fields in online forms.

GillesInnov35 commented 1 month ago

hi all, I think that we should have also separate fields. BR Gilles

AxelNennker commented 1 month ago

eKYC_IDA defers the definition of address (fields) to OpenId Connect. https://openid.net/specs/openid-connect-core-1_0.html#AddressClaim

street_address Full street address component, which MAY include house number, street name, Post Office Box, and multi-line extended street address information. This field MAY contain multiple lines, separated by newlines. Newlines can be represented either as a carriage return/line feed pair ("\r\n") or as a single line feed character ("\n").

The current implementor's draft https://openid.net/specs/openid-connect-4-identity-assurance-1_0-ID4.html has these examples.

fgrep  street_address openid-connect-4-identity-assurance-1_0-ID4.html | sed "s/^[ \t]*//" | sort | uniq
"street_address": "114 Old State Hwy 127"
"street_address": "122 Burns Crescent"
"street_address": "69 Kidderminster Road"
"street_address": "An der Weide 22"
"street_address": "Energiestrasse 33"
"street_address": "Gatunamn 221b"
"street_address": "PO BOX 2",

Deutsche Telekom is electronically reading the German eID and German residence permit and those government-issued documents have the combined "streetname" and "housenumber".

BSI Technical Guideline TR-03110 Part 4 Section 2.2.3.1 Data Structures

Place ::= SEQUENCE {
  street [10] UTF8String OPTIONAL,
  city [11] UTF8String,
  state [12] UTF8String OPTIONAL, -- can also be used to denote region
  country [13] ICAOCountry,
  zipcode [14] PrintableString OPTIONAL
}

Having separate fields might improve the matching rate but might just put the complexity of splitting or matching to a different system. The system splitting the parts could be the kyc-matching system.

Given that both OIDC and German government-issued documents have the combined information, I argue for having the same in Camara KnowYourCustomer.


For a trip down memory lane to December 2005 - specifying addresses on a global level being difficult for a long time - and for an example where the two fields are separate, have a look at this, somewhat US centric, specification. https://datatracker.ietf.org/doc/html/rfc4119

HuubAppelboom commented 1 month ago

@AxelNennker Hi Axel, in the Netherlands the combination of postalcode - streetnumber (and optional streetnumber suffix) provides a much more exact way in defining and address, because with these 2-3 fields you have an exact address and do not depend for example how a streetname or the name of a city is written, and because it is exact and unique, it gives a much better match rate.

Just to give you some idea, we have street names that can be written as "1e Helmerstraat", or "Eerste Helmerstraat", and we have also streetnames where numbers can be part of the streetname, like "Plein 1945". If you mix the street number in here, it may difficult to determine which is which.

For the Mobile Connect Match API we currently use the 3 fields metionned to define an address (and nothing else), and that is working quite satisfactory. I don't think it is a good idea to switch to something which provides less accurate results, and that is why we need to have street number etc as a separate fields.

HuubAppelboom commented 1 month ago

@AxelNennker Just to add this, if you only have one combined address line, you run into problems with a match product. Often there are spelling mistakes in streetnames (check your customer database, you'll be surprised), which normally are not an issue. In case there is such a mistake, you can see it is in the streetname (because there you see the mismatch. Now if you apply eg Jaro-Winkler on the streetname, and an exact match on the housenumber, you can still easiliy detect that the address the CSP has submitted is corrected, and prevent that you unnecessarily have too many false negatives. Customers are quite sensitive to false negatives, because that means they have to check the address in an alternative way, which creates unnnecessary friction and costs. The best way to handle this is to to keep attributes as much as possible separate, and one the ones where spelling mistakes can play an issue, use Fuzzy Name matching logic like Jaro-Winkler.

Ofcourse this issue plays only with the Match version of the api, as soon as you start sharing attributes plain text, there is no such issue (and you customer can spot the spelling mistakes themselves ;-) )

HuubAppelboom commented 1 month ago

To circle back to the original issue discussed, I think it is good idea to follow the TMF terminology for geographic address as @GillesInnov35 proposed. So this means that we only need to change the HouseNumberExtension to streetNrSuffix

GillesInnov35 commented 1 month ago

hi @HuubAppelboom and thanks a lot for your proposition which will help us to close the issue. As streetNumber is currently used, may I propose streetNumberSuffix to be consistent.

BR Gilles

AxelNennker commented 1 month ago

I think "suffix" in streetNumberSuffix is wrong because the house number suffix is only a suffix in some countries. There are often suffixes in German house numbers. But that is not so globally, I think.

I am all for using "standards" but this TMForum one seems overly complex.

I would go for "streetname" and "houseNumber". No extra suffixes, prefixes, extensions which could all be interpreted differently by different developers.

German examples: (test cases from my Java code - but not KYC)

Going wild: https://de.wikipedia.org/wiki/Jachenau had no streets! City districts act as street names since 2010, the district is "Dorf" is since then used as the street name. Also note the fractions in the house number. Fractions are also used in other cities e.g. Augsburg.

city: "Jachenau" streetName: "Dorf" houseNumber: "7 1⁄3" // actually there is no blank in this house number, but github keeps auto-formatting to 71⁄3 zipCode: 83676

HuubAppelboom commented 1 month ago

@AxelNennker Please check with your colleagues responsible for Mobile Connect, they have some experience in this area as well.

In the Netherlands we started initially with only postal code + house number, assuming this was sufficient, and since there are many ways to handle suffixes. In the netherlands some suffixes are official (and registered in the official register for residents), and some are self-invented, in case there is a need but nothing exists (for example houseboats, caravan parks with are used for semi-permanenet residency). What matters in the end is what and how it ends up in the MNO registration (in our case we keep the suffix always as a separate data field), and what customers happen to do with it (most of our customers do the same as we do).

Since there are many different ways that these suffixes can be written we initially left these out, but later on these were added on request of our customers as a separate item. The housenumber can usually be matched without much risk of a false negative, but for the suffixes the chance is higher. That is why we decided to keep these items as separate.

HuubAppelboom commented 1 month ago

@AxelNennker In the Netherlands we also have one or twocities were they made the mistake in the past not to issue streetnames, but the name of the neighbourhoud + 2 digits is used as a streetname. Once you combine that with the housenumbers, people sometimes think they have 4 digit housenumbers (which they do not). But that is only one city.

The easiest to tackle housenumber suffixes are the government assigned suffixes. This usually happens when extra houses are built on a plot of land, and they can not renumber the street. We do not use the Bruchteilhausnummern as in Germany, but in our case the government then simply adds letters, so you can have house number 4 that becomes 4, 4A and 4 B in case 2 extra houses are build between number 4 and number 6). These suffixes can easily be included and included in the housenumber (if you give people the right instructions, to include it).

The real problem come with the suffixes that people invent themselves, and which can be written in different ways. Take for example a nursing home. In the netherlands this typically has a regular streetnumber like for example "2". But to have the mail delivered more easily to the residents, they typically add a room number to the address as a suffix, but this can be written in different ways. Like "k105" or "105" or "kamer 105". Parks with holidays homes which have more permanent residency often have the same issue, when there is no official housenumber allocated.

HuubAppelboom commented 1 month ago

To cater for the differen suffixes, for the CAMARA version we could also do the following, in case most MNO's do not use a separate suffix field:

In the streetNumber field, we always instruct customers to include any suffixes that they may have. In the answer you provide whether it is a full match (true), not at all (false), or whether only the first number matches, without the suffixes (number_only for example as an answer).

HuubAppelboom commented 1 month ago

@AxelNennker for the Netherlands we have defined so far in Mobile Connect Match the streetNumber as digits only, and anything else as a suffix. The example you mention with Heidestraße 17A, would the break down in streetName Heidestraße, streetNumber 17, and streetNumberrSuffix as A.

The reason we do this in the Mobile Connect version is that we use hashing, and in case of hashing you really have to define a very strict normalisation rules, otherwise you don't get good results. That's why we have been defining streetNr as digits only in the Mobile Connect version.

As long as we don't use hashing in CAMARA, and use plain text, we could choose to combine streetNumber and streetNumberSuffix. The evaluation done by the MNO can then cater for the various writing styles of the suffixes that may vary from country to country.

AxelNennker commented 4 weeks ago

I agree that it is better for KYC to have everything separate. I guess there need to be clear definitions and examples which part fof the plain-text address is what.

Back to this issue, would that then be "streetName Heidestraße, houseNumber 17, and houseNumberrSuffix as A"?