GillesInnov35 commented 1 year ago

CAMARA KYC Match - Specifications

Bellow a proposal of comparison matrix between different offers' specifications and CAMARA initial requirements proposal. Key points :

List of KYC attributes requirements must be reviewed, propose a short list to target a first version based on existing proposals
Match result response must be defined. A match score (matching percentage) or a match result value such as what GSMA returns (“Y”– match is successful, “N-NA” - match failed, data is not available, “N-AV” – match failed; data is available, “N-AD” – match failed, data is available, but access is denied)

Request Specifications

CAMARA KYC Match requirements	GSMA KYC Match	KDDI KYC Match	Orange KYC Match	Proposal
msisdn	phone_number	subscriber_phone_number_match	msisdn	phoneNumber
name	name	user_name_match	name	name
given name	given_name		given_name	givenName
family name	family_name		family_name	familyName
	address	subscriber_formatted_match		address
street name	house_or_housename		street_name	streetName
region		subscriber_region_match
postal code	postal_code	subscriber_postal_code_match	postalCode
town	locality		locality	locality
country	country		country	country
birthdate	birthdate	subscriber_birthdate_match	birhdate	birthdate
email address			email	email

Response Specifications

CAMARA KYC Match requirements	GSMA KYC Match	KDDI KYC Match	Orange KYC Match
msisdn	phone_number	subscriber_phone_number_match	msisdn
name	name	user_name_match	nam_score
given name	given_name		given_name_score
family name	family_name		family_name_score
	address	subscriber_formatted_match
street name	house_or_housename		street_name_score
region		subscriber_region_match
postal code	postal_code	subscriber_postal_code_match	postalCode_score
town	locality		locality_score
country	country		country_score
birthdate	birthdate	subscriber_birthdate_match	birhdate_score
email address			email_score

StefanoFalsetto-CKHIOD commented 1 year ago

Hi Gilles, I have some feedbacks: Request Specifications

Why do you wants to change the name from MSISDN to phoneNumber? The word MSISDN is directly referring to the standard way of representing a phone number.
I would like to avoid the use of "address" attribute. This aggregated field is not only depending on country rules but also on internal MNO BSS implementation. Since we are rebooting from scratch this service, we can leverage on previous experience and ask to MNOs to export the single components of the address, such as: street_name --> the name of the street where the end customer resides. Just the street name, nothing else (i.e., no house number, no zip/postal code, etc.) town province region house_number --> the number of the building where the end customer resides postal_code

We can still include the "address" attribute but discourage in some way the use of it.

Response Specifications Since the answer will be Y, N-NA, N-AV, N-AD the term "score" could be misleading. I am still thinking to a valid alternative to propose, but I can't figure it out now.

ToshiWakayama-KDDI commented 1 year ago

Thanks very much, @GillesInnov35, for creating this issue.

May I ask questions for clarification?

What do you mean by CAMARA KYC Match requirements?
What do you mean by GSMA KYC Match?

I have one comment: we have agreed that calculating matching score is for our future releases, so, it should not be included for our initial release.

Thanks.

StefanoFalsetto-CKHIOD commented 1 year ago

Hi @ToshiWakayama-KDDI we agreed to not include the matching score. But we also agreed that the score is something we need to "take into consideration in some way" since we will work on it as soon as the first version of those specifications is released. Hence, I think it’s important to do now something to enable future improvements.

@GillesInnov35, I figured out my proposal: In order to find a "middle way" between future developments and Toshi pressure for next-to-come first milestone, we can still use "_match" suffix on response attributes. In that way we can address our future discussions on modifying just the "Y" response. Maybe could it be "Y-nn" where nn is the score? Let's keep the proposals for future discussions.

GillesInnov35 commented 1 year ago

Hi @StefanoFalsetto-CKHIOD , @ToshiWakayama-KDDI Thanks a lot for your comments. I'll try to explain the proposition Phone number rather than msisdn

As discussed with Ludovic ROBERT who is involved in few CAMARA API API projects, phone number is commonly used and not msisdn (Number Verify API definition). I think we'll have to be compliant with all the others CAMARA API design projects.

use of address

I agree with you Stephano, the value of address . The attribute address appears in GSMA Mobile Connect KYC Match API Definition. That's why I mentionned it to discuss about. We could limit the attributes to the detailed address as you propose which will be more precise.

GSMA Mobile Connect KYC Match

Toshi, to answer to your question, GSMA has published a Mobile Connect KYC Match Definition and technical requirements (Feb. 2022). I had a look on it and most of attributes are similar to CAMARA requirements. I think it was interesting to compare with others propositions.

fernandopradocabrillo commented 1 year ago

Hi @GillesInnov35 I think that this table is lacking Telefonica's proposal too and some of our fields (like idDocument) and vision for the properties. Can you please update accordingly? Thanks!

Regarding use of address In our proposal, the address field is composed of the different parts it can have. We consider that having a single field in which the postal address can be included in such a generic way adds complexity and, as @StefanoFalsetto-CKHIOD said, is very country-dependent. So we support having different fields for its representation.

GillesInnov35 commented 1 year ago

Hi @fernandopradocabrillo , yes sure

could you send me the list of atributes Telefonica proposes in its solution. thanks

javier-carrocalabor commented 1 year ago

Hi, @GillesInnov35, here the Telefonica's proposal mentioned by @fernandopradocabrillo : https://github.com/camaraproject/KnowYourCustomer/blob/f153a4799213fc4b0474d156c7b10b490015439e/code/API_definitions/kyc-match.yaml#L143 which can be summarized in: phoneNumber, idDocument, identity (composed of firstName and lastName), address (composed of postalCode, streetName and streetNumber), and birthdate. And the responses would be xxxx_response for each of them.

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Please find the revised shortlist table below. I have added our proposed parameters/attributes, which are included in our YAML file, to the shortlist table. Also I have added Telefonica's parameters/attributes as well. Hope it is correct. Also I have changed GSMA Match to MobileConnect Match and moved it to the right as MobileConnect is not our proposal.

I have one point to ask you at the moment: Our company differentiates Subscriber (who makes contract with us) and User (who actually uses the phone). For example, Pararent is Subscriber and their child is User. Do you have the same kind of differenitation?

Match Request Body

CAMARA KYC Match requirements/categories	KDDI KYC Match	Orange KYC Match	Telefonica KYC Match	GSMA KYC Match	Orange Proposal
Phone Number	subscriber_phone_number_match	msisdn	phoneNumber	phone_number	phoneNumber
(special phone number)	main_subscriber_phone_number_match
ID Document			idDocument
Subscriber name	user_name_match	name	identity (composed of firstName and lastName)	name	name
(name reading)	subscriber_name_kana_hankaku_match
(name reading)	subscriber_name_kana_zenkaku_match
(given name)		given_name	(included in identity)	given_name	givenName
(family name)		family_name	(included in identity)	family_name	familyName
Subsscriber Postal Code	subscriber_postal_code_match	postalCode	(included in address)	postal_code
Subscriber Address	subscriber_formatted_match		address (composed of postalCode, streetName and streetNumber)	address	address
(street name)		street_name	(included in address)	house_or_housename	streetName
(street number)			(included in address)
Subscriber Address-Region	subscriber_region_match
Subscriber Address-Town		locality		locality	locality
Subscriber Address-Country		country		country	country
Subscriber Birthdate	subscriber_birthdate_match	birthdate	birthdate	birthdate	birthdate
Subscriber Email Address		email			email
User Name	user_name_match
(user name reading)	user_name_kana_hankaku_match
(user name reading)	user_name_kana_zenkaku_match
User Birthdate	user_birthdate_match
3rd party ID	cp_id
	service_id

KYC Match Response

CAMARA KYC Match requirements/categories	KDDI KYC Match	Orange KYC Match	Telefonica KYC Match	GSMA KYC Match
Phone Number	subscriber_phone_number_match	msisdn	phoneNumber_response	phone_number
(special phone number)	main_subscriber_phone_number_match
ID Document			idDocument_response
Subscriber name	subscriber_name_match	name_score	identity_response	name
(name reading)	subscriber_name_kana_hankaku_match
(name reading)	subscriber_name_kana_zenkaku_match
(given name)		given_name_score	(included in identity)	given_name
(family name)		family_name_score	(included in identity)	family_name
Subsscriber Postal Code	subscriber_postal_code_match	postalCode_score	(included in address)	postal_code
Subscriber Address	subscriber_formatted_match		address_response	address
(street name)		street_name_score	(included in address)	house_or_housename
(street number)			(included in address)
Subscriber Address-Region	subscriber_region_match
Subscriber Address-Town		locality_score		locality
Subscriber Address-Country		country_score		country
Subscriber Birthdate	subscriber_birthdate_match	birthdate_score	birthdate_response	birthdate
Subscriber Email Address		email_score
User Name	user_name_match
(user name reading)	user_name_kana_hankaku_match
(user name reading)	user_name_kana_zenkaku_match
User Birthdate	user_birthdate_match

Many thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Also please find the below a short list table for KYC Fill-in attributes/parameters based on our Fill-in YAML.

Any comments would be welcome.

Fill-in Request Body

CAMARA KYC Fill-in requirements/categories	KDDI KYC Fill-in	No other Fill-in proposals	Proposal
3rd party ID	cp_id

Fill Response

CAMARA KYC Fill-in requirements/categories	KDDI KYC Fill-in	No other Fill-in proposals	Proposal
Phone Number	subscriber_mobile_phone
Subscriber name	subscriber_name
(family name)	subscriber_name_family
(given name)	subscriber_name_first
(name reading)	subscriber_name_kana_hankaku
(family name reading)	subscriber_name_kana_hankaku_family
(given name reading)	subscriber_name_kana_hankaku_first
(name reading)	subscriber_name_kana_zenkakuku
(family name reading)	subscriber_name_kana_zenkaku_family
(given name reading)	subscriber_name_kana_zenkaku
Subsscriber Postal Code	subscriber_postal_code
Subscriber Address	subscriber_formatted
Subscriber Address-Region	subscriber_region
Subscriber Birthdate	subscriber_birthdate
Subscriber Gender	subscriber_gender
Subscriber Email Address	subscriber_mail_address
User Name	user_name
(user family name)	user_name_family
(user given name)	user_name_first
(name reading)	user_name_kana_hankaku
(family name reading)	user_name_kana_hankaku_family
(given name reading)	user_name_kana_hankaku_first
(name reading)	user_name_kana_zenkakuku
(family name reading)	user_name_kana_zenkaku_family
(given name reading)	user_name_kana_zenkaku
User Birthdate	user_birthdate

Many thanks, Toshi

GillesInnov35 commented 1 year ago

@ToshiWakayama-KDDI, Orange KYC offers differentiate also subscriber and user. The 3-Legged authentication architecture is based on user information who authenticates and should consent. But information returned by the service concern the subscriber who signed the contract. @fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

GillesInnov35 commented 1 year ago

I have a question regarding Toshi's proposition included language information (user_name and user_name_kana_hankaku). Does it mean we should introduce a dataType attribute valued with InternationUserClass, JapaneseUserClass, etc. This kind of information to type the data has been for example included in DeviceLocation API definition. Gilles

ToshiWakayama-KDDI commented 1 year ago

@ToshiWakayama-KDDI, Orange KYC offers differentiate also subscriber and user. The 3-Legged authentication architecture is based on user information who authenticates and should consent. But information returned by the service concern the subscriber who signed the contract. @fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

Hi @GillesInnov35 , Thanks. Just to double check, address, name, email etc. that are currently proposed by Orange are all for Subscribers???

Thanks.

ToshiWakayama-KDDI commented 1 year ago

I have a question regarding Toshi's proposition included language information (user_name and user_name_kana_hankaku). Does it mean we should introduce a dataType attribute valued with InternationUserClass, JapaneseUserClass, etc. This kind of information to type the data has been for example included in DeviceLocation API definition. Gilles

Hi @GillesInnov35 , Thanks for the information! I have just looked at DeviceLocation YAMLs, but I could not find it (dataType). Could you advise me which YAML has it (dataType)?

Thanks

GillesInnov35 commented 1 year ago

Hi @ToshiWakayama-KDDI ,

yes, information returned or compared by the Orange Match ID API concern only subscriber's information.
in the DeviceLocation API deifnition the attribute which specifiy the type of the class is areaType (circle or polygon).

ToshiWakayama-KDDI commented 1 year ago

Hi @ToshiWakayama-KDDI ,

yes, information returned or compared by the Orange Match ID API concern only subscriber's information.

in the DeviceLocation API deifnition the attribute which specifiy the type of the class is areaType (circle or polygon).

Hi Gills @GillesInnov35 , Thank you very much. I will look into it quickly, together with my internal colleagues.

fernandopradocabrillo commented 1 year ago

@fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

Hi @GillesInnov35 , That's correct, we decided not to include the idDocument type in the proposal since it added unnecessary complexity. In the end we want to check if the idDocument provided matches the one stored by the MNO, the important thing here is the number itself.

@ToshiWakayama-KDDI From our side, we also do the match against subscriber's information only.

ToshiWakayama-KDDI commented 1 year ago

Hi @fernandopradocabrillo , Thank you.

Hi @GillesInnov35 , I have quickly checked 'areaType' in location-retrieval API and location-verification API, but I am not immediately quite sure if we could introduce UserClass attributes in the similar way for our purpose. Anyway, at the mobment, we do not consider introducing new attributes like UserClass, as it is better for us to keep our first version simple with only required attributes.

Many thanks,

GillesInnov35 commented 1 year ago

Thanks a lot @ToshiWakayama-KDDI As we are currently discussing about what attributes should be mandatory my question was: Do specific attributes kana should be part of KYC-Match request definition ?

If Yes, dataType used a discriminator would be useful to avoid duplication of concerned attributes
if no, there's no need to differentiate 2 schemas

HuubAppelboom commented 1 year ago

In the Netherlands, we currently have the following attribute list in use:

Given Name Initials (we either match on only the first initial or on all initials when availabe)
Family Name (which is stripped from any prefixes)
Postal code
House number
House number extension (not in the Mobile Connect standard but we need it in NL)
Date of Birth
E-mail address

We don't use street name, town etc because in the Netherlands postal code + house number + house number extension is very exact already.

We already have relatively high match rates (up to 80% for family name). Nevertheless, I think we can still improve by the following: In stead of Given Name Initials, use the following attributes in parallel:

Initial of the first Given Name
All Initials of Given Names
The first Given Name
All Given Names

Often people only record their first Given Name or Initial (although many have multiple Given Names). The use of initials can help for cases where there are multiple ways how to write a given name (for example Steve and Stephen).

In the Netherlands we have a list of prefixes that we usually strip from the family name. The reason we do this is that the prefixes can be abbreviated, which hinders the matching. What we can add is an extra attribute in which you compare these prefixes.

For Family Name, I think we can improve by adding the Family Name at birth as a separate attribute. In the Netherlands, your familiy name can change when you get married, so this may change during your life time. Your Family Name at birth never changes, and when available, it is better for matching because it stays constant.

Streetname we do not use, because our postal code + house number + housenumber extension is very exact.

So, we would propose the following list (for NL):

Initial of the first Given Name
All Initials of Given Names
The first Given Name
All Given Names
Prefixes of the Current Family Name
Current Family Name
Prefixes of the Family Name at birth
Familiy Name at birth
Postal Code
House Number
House Number Extension
Date of Birth
E-mail address

HuubAppelboom commented 1 year ago

Annex B - MC Product Specification - Match, v1.4.xlsx

Attached also the list of specs we currently use for Match in NL. It also includes the list of prefixes we strip from family name

GillesInnov35 commented 1 year ago

Thanks @HuubAppelboom I think we should be able to identify a short list of common attributes to all designs and propose a first draft.

javier-carrocalabor commented 1 year ago

I agree with @GillesInnov35 in the sense that I think we should see it from the perspective of a Service Provider that is asking a user for some contact information, and shows a form to collect several fields of data. Then, IMHO, and recongizing I don't know the habits in the Netherlands, I don't think the Service Provider is going to ask the user for, for example, all potential ways of expressing their name, but will ask for the most common way to express the name in that country.

HuubAppelboom commented 1 year ago

I agree with @GillesInnov35 in the sense that I think we should see it from the perspective of a Service Provider that is asking a user for some contact information, and shows a form to collect several fields of data. Then, IMHO, and recongizing I don't know the habits in the Netherlands, I don't think the Service Provider is going to ask the user for, for example, all potential ways of expressing their name, but will ask for the most common way to express the name in that country.

The issue is not that we think that Service Providers should ask end users for all different possible variations that you can have, but that MNO's and Service Providers have a history and way of working in collecting the data. For example, in the Netherlands we have a couple of MNO's which only have collected initials. Making Given Name(s) the only option will not work in this case (that's why we have chosen for initials-only in the Netherlands, deviating from the Mobile Connect standard).

The other issue you have is when you ask for matching all initials (or given names), and provide that as the only option, you will see that often 2nd and rd initials are missing in current databases (at least we have seen that), which results in a lower match rate than you could have. That's why we propose to make several attribute fields available in the standard, and that you match on all field that you have available. The same principle would apply for family name, if you have the family name at birth also available, that you can aso provide a match on this. In the end , you can safely get to a higher overall match rate through this, without the need to go to more complex solutions like a match score based on whether the attributes are similar.

As far as the availability of data is concerned, in case the MNO does not have a specific attribute in their CRM system, you can always answer with "NA".

ToshiWakayama-KDDI commented 1 year ago

Thanks a lot @ToshiWakayama-KDDI As we are currently discussing about what attributes should be mandatory my question was: Do specific attributes kana should be part of KYC-Match request definition ?

If Yes, dataType used a discriminator would be useful to avoid duplication of concerned attributes

if no, there's no need to differentiate 2 schemas

Hi @GillesInnov35 ,

Thanks very mucy. First of all, my understanding is that we are not discussing mandatory attributes, but that all attributes should be optional, as I shared on Tuesday. Surely we need mandatory requirement like 'at least one attribute should be included in a API match request'.

So, to answer your question, we would like to have specific attributes kana etc. part of KYC-Match request definiton, as one of the options.

Then I understand your point that dataType used as a discriminator would be useful to avoid duplication of concerned attributes, and I think I need to look into it.

Many thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi @HuubAppelboom , @javier-carrocalabor , @GillesInnov35 ,

Thank you, all, for your comments. Now I understand the Netherlands needs some spedific attributes. As I shared on Tuesday, I would propose to include all the required attributes, both of commonly used attributes and country/market specific attributes, if we categorise, in our first version. I think that all the attributes should be Optional, as it seems there are many ways to use this API/KYC-Match functionality so it is difficult to identify mandatory ones. Of course, we need some mandatory requirement like 'there should be at least one attribute incuded in a API request'.

If you think we may need categorisation of Common attributes and Country/Market specific attributes, we could write it down somewhere in YAML or in API documentation.

What do you think?

Many thanks, Toshi

HuubAppelboom commented 1 year ago

Hi @ToshiWakayama-KDDI,

I would indeed support to include all attributes, and include both commonly used and country/market specific attributes. As a rule, I would suggest that when you can, you support all attributes for which you have data for.

For example, for NL we currently do not support streetname (because it is not necessary here), but for the sake of international compatibility we will implement it.

On the customer side, the customer can always choose which attributes will be asked to be matched (with the minimum of one of course). For example, for some cases we only need address verification and nothing else, because the customer is already using a different source for the name, date of birth, email etc.

What should also be prevented is that customers start offering data in case they don't have it, because this will give you wrong match rate statistics. For example, we had one customer that did not have Date of Birth data, so in stead they always submitted "YYYY-MM-DD" as a hashed string, which ofcourse never matches, or a dummy date like "1900-01-01". You will get low match rates, and it really take some time to find out what is going wrong. So in any case, customers must always submit valid data, and not dummy data.

With kind regards Huub

GillesInnov35 commented 1 year ago

Hi @ToshiWakayama-KDDI , term mandatory was not appropriate because as you say all attributes should be optional of course (except phone number). I was meaning attibutes we'd like to see in the API design (will be common attributes). Thanks a lot

StefanoFalsetto-CKHIOD commented 1 year ago

As I said in some other comments, I will be happy to discuss about deprecating the address attribute. It is far better (for many countries around the world) to use different attributes for the single address components.

StefanoFalsetto-CKHIOD commented 1 year ago

In order to find the right initial set of attributes, I am sharing the full set of attributes that CKH (and hence all the affiliates operators) are offering to its Partners. As you can see we are supporting all the attributes defined into the GSMA IDY.28 specifications plus some custom ones (e.g., the age verification). Some of the address related attributes such as houseno_or_housename_hash are used for historical reasons, but will be deprecated in future. Moreover, some of the custom attributes are calculated on the fly by managing atomic data obtained from MNOs (e.g., age and age_is_greater_than are calculated using the birthdate).

Requested Attribute	Returned value
`account_state`	Active/Inactive
`age_hash`	True/False
`age_is_greater_than`	True/False
`address_line1_hash`	Y/N-NA/N-AV
`address_line2_hash`	Y/N-NA/N-AV
`billing_segment`	PAYM/PAYG
`birthdate_hash`	Y/N-NA/N-AV
`city_or_province_hash`	Y/N-NA/N-AV
`country_hash`	Y/N-NA/N-AV
`email_hash`	Y/N-NA/N-AV
`family_name_hash`	Y/N-NA/N-AV
`flat_number_hash`	Y/N-NA/N-AV
`gender_hash`	Y/N-NA/N-AV
`given_name_hash`	Y/N-NA/N-AV
`house_name_hash`	Y/N-NA/N-AV
`house_number_hash`	Y/N-NA/N-AV
`houseno_or_housename_hash`	Y/N-NA/N-AV
`is_adult`	True/False
`is_age_verified`	True/False
`is_email_verified`	True/False
`is_lost_stolen`	True/False
`middle_name_hash`	Y/N-NA/N-AV
`postal_code_hash`	Y/N-NA/N-AV
`title_hash`	Y/N-NA/N-AV
`town_hash`	Y/N-NA/N-AV

ToshiWakayama-KDDI commented 1 year ago

As I said in some other comments, I will be happy to discuss about deprecating the address attribute. It is far better (for many countries around the world) to use different attributes for the single address components.

Hi @StefanoFalsetto-CKHIOD ,

Thank you for the comment, but I think the address attribute is required. As you pointed out in your previous comment, the aggregated field is depending on country rules, which I think is true, and in some countries like Japan Customers need the aggregated address field, mainly because it is difficult to split our address into separete fields.

I think both of the aggregated address field and split address fields can exist as optional fields. If a MNO does not support a specific attribute and the MNO is asked about the specific attribute, it can answer with Not_Available or something. It may be better to share what attributes are supporeted by a MNO and which are not, but this would be a Business matter or could be our future topic.

Thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi @GillesInnov35 , @fernandopradocabrillo , @javier-carrocalabor , @HuubAppelboom , @StefanoFalsetto-CKHIOD ,

Thank you for your comments. I feel our discussion is spreading and exploding (sorry I don't know the proper word) and we have to start converging our discussion, considering our target time.

I have some suggestion for converging our discussion as below:

Regarding Age attributes, it needs some calculation and also it is related to the new API 'Age Verification', so, I would suggest to delay it for future enhancement.
Regarding attributes requiring caluculation or processing, I would suggest to delay them for future enhancement. We have agreed to delay Match Scoring for future enhanement, and Hashing and Age are the same. (Because solution discussion is needed and it would take time.)
Regarding attributes not related to subscribers/users, e.g. account active/inactive, I would suggest to delay them for future enhancement. (Because we need to discuss it is required or not, as it is unclear whether it is KYC information.)
Regarding any attributes requiring complex and deep discussion, I would suggest to delay them for future enhancement. (Because of our short time.)

Any views?

Considering No.4 above, we can agree to delay User information attributes (separete from Subscriber/Contractor information) for future enhancement.

Thanks, Toshi

GillesInnov35 commented 1 year ago

Hi, I agree with @ToshiWakayama-KDDI proposition to target to a limited list of attributes in this first version even if it does not cover the full scope of existing offers. If we have a look at what proposes TMForum (which is a main standard) for a party/individual resource, the list of attributes which define a person is limited to few fields. It means that such a list already exists in others specifications. It could be a good example, right ?

for information, see bellow some of fields in TMF 632 party (individual) specifications

 "givenName": "Jane",
 "familyName": "Lamborgizzia",
 "legalName": "Smith",
 "middleName": "JL",
 "fullName": "Jane Smith ep Lamborgizzia",
 "formattedName": "Jane Smith ep Lamborgizzia",
 "birthDate": "1967-09-26T05:00:00.246Z",

  Geographic address
         "city": "Morristown",
         "country": "USA",
         "postCode": 7960,
         "stateOrProvince": "New Jersey",
         "street1": "240 Headquarters Plazza",
         "street2": "East Tower - 10th Floor"
  ContactMedium  
     "emailAddress": "jane.lamborgizzia@gmail.com"
     "phoneNumber": "+112785426565"

As formattedName exists for Name, a formattedAddress could be added for aggregation of fields of address .

javier-carrocalabor commented 1 year ago

I agree with @ToshiWakayama-KDDI points in the shake of simplification and, at the same time, to find a common base that can cover most of the needs. Particularly, I see @GillesInnov35 proposal (https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835672991) a good starting point to achieve this.

Regarding the "name" scope, I think that this proposal can fit also in @HuubAppelboom requirements. But I consider redundant having both 'fullName' and 'formattedName". I would consider just one of them.
Regarding the "address" scope, I see 'street1' and 'street2' to much unspecific, and I would go for something like 'streetName' and 'streetNumber'. Perhaps, having into account that 'streetNumber' (or any other name for such parameter) could mean 'street number' as well as 'house number' or 'house name'.

HuubAppelboom commented 1 year ago

In my experience, you will need a minimum list of attributes that is needed to properly identify a person.
Unfortunately this list of what is needed varies per country. So, in case you want to work cross border, you will always see a longer list than what is the minimum for a country.

In addition, for the case of a matching proces based on hashes, some attributes cause problems that make these less suitable. For example, in the TMF 632 list above "fullName": "Jane Smith ep Lamborgizzia" will give problems , because there is an abbrevation "ep" used which is probably language dependent. In the netherlands we have for example "ev" or "wv", or we use a "-" symbol, where it can be Smith-Lamborgizzia or Lamborgizzia-Smith. And other markets have their own habits, which are different. This is exactly the reason why we would like to have the family name at birth as an extra attribute. If the current fullName does not match, but the familiy name at birth matches, you still know who it is with sufficient precision.

I don't think it is wise to make the list of attributes as small as possible, because you will run the risk that it becomes too small to be of any use. And for markets where Match is alerady being used, it makes no sense to come with an API which is less effective.

What may be more pragmatic is to start in one or 2 countries, and define the API there, by defining the minimum what is needed in these markets (and have a better offering than the current Match product). And do a market by market introduction, and in each market add the attributes that are needed to have minimum set for that market as well. This way you will have a growing list of attributes over time.

HuubAppelboom commented 1 year ago

PS. In the current EIDAS2 wallet standardisation process in Europe there is also a PID being defined (a list of attributes), that may be worth to take a look at.

GillesInnov35 commented 1 year ago

Hi @HuubAppelboom, I understand your point of view regarding your experience hower to my opinion CAMARA approach is to think to a global solution which could be adopted by much operators and partners. If we think "country" from the start I'm not sure it will be the case. I don't clearly understand why we could not start with a limited list of attributes for which MNO should be able to compare information and return a match result, even if I agree with you that in some use cases the match result would not be so helpful depending on the expected trusted level. If we think code, polymorphism should help us to define new specific schemas inheriting from this first base and perhaps targeting specific countries's requirements. I don't know if my vision is clear enough. I'll also discuss about that internally with my colleagues. Thanks a lot

HuubAppelboom commented 1 year ago

Hi @GillesInnov35 , For example for the Netherlands I don't think any telco would start introducing a CAMARA version of KYC Match that is sigificantly less than what is available already available today.
Kind regards Huub

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Based on our discussions, I have crated a compromised proposal by updating our initial proposed table (Gilles on 16th Nov and me on 20th Nov), as below. Paramters/attributes in the rightmost columns are my proposal.

Please note each of the proposed parameters/attributes has a Match suffix, but this is just my proposal and we have to discuss suffix for Request and Response separately, so, please check what parameters/attributes we need for our initial version.

I think we have to conclude our parameter/attribute discussion within this week, so any comments are welcome.

Match Request Body

CAMARA KYC Match requirements/categories	KDDI KYC Match	Orange KYC Match	Telefonica KYC Match	GSMA KYC Match	Orange Proposal	KPN	Hutchison	Compromised Proposal
Phone Number	subscriber_phone_number_match	msisdn	phoneNumber	phone_number	phoneNumber			phoneNumberMatch
(special phone number)	main_subscriber_phone_number_match							mainPhoneNumberMatch
ID Document			idDocument					idDocumentMatch
Subscriber name	user_name_match	name	identity (composed of firstName and lastName)	name	name			nameMatch
(name reading)	subscriber_name_kana_hankaku_match							nameKanaHankakuMatch
(name reading)	subscriber_name_kana_zenkaku_match							nameKanaZenkakuMatch
(given name)		given_name	(included in identity)	given_name	givenName			givneNameMatch
(family name)		family_name	(included in identity)	family_name	familyName			familyNameMatch
Subsscriber Postal Code	subscriber_postal_code_match	postalCode	(included in address)	postal_code			postalCodeMatch
Subscriber Address	subscriber_formatted_match		address (composed of postalCode, streetName and streetNumber)	address	address			addressMatch
(street name)		street_name	(included in address)	house_or_housename	streetName			streetNameMatch
(street number)			(included in address)					streetNumberMatch
Subscriber Address-Region	subscriber_region_match						regionMatch
Subscriber Address-Town		locality		locality	locality			localityMatch
Subscriber Address-Country		country		country	country			countryMatch
Subscriber Birthdate	subscriber_birthdate_match	birthdate	birthdate	birthdate	birthdate			birthdateMatch
Subscriber Email Address		email			email			emailMatch
Subscriber name (Initial of the first Given Name)						(Initial of the first Given Name)		firstGivenNameMatch
(All initials of Given Names)						(All initials of Given Names)		allGivenNamesInitialsMatch
(The first Given Name)						(The first Given Name)		firstGivenNameMatch
(All Given Names)						(All Given Names)		allGivenNamesMatch
(Prefixes of the Current Family Name)						(Prefixes of the Current Family Name)		currentFamilyNamePrefixesMatch
(Family Name at birth)						(Family Name at birth)		familyNameAtBirthMatch
Subscriber Address (House Number Extension)						(House Number Extension)		houseNumberExtensionMatch
Subscriber Gender	subscriber_gender_match							genderMatch
~~User Name~~	~~user_name_match~~						~~userNameMatch~~
~~(user name reading)~~	~~user_name_kana_hankaku_match~~						~~userNameKanaHankakuMatch~~
~~(user name reading)~~	~~user_name_kana_zenkaku_match~~						~~userNameKanaZenkakuMatch~~
~~User Birthdate~~	~~user_birthdate_match~~						~~userBirthdateMatch~~
3rd party ID	cp_id						cp_id
	service_id						service_id

KYC Match Response

CAMARA KYC Match requirements/categories	KDDI KYC Match	Orange KYC Match	Telefonica KYC Match	GSMA KYC Match	KPN	Hutchison	Compromised Proposal
Phone Number	subscriber_phone_number_match	msisdn	phoneNumber_response	phone_number			phoneNumberMatch
(special phone number)	main_subscriber_phone_number_match						mainPhoneNumberMatch
ID Document			idDocument_response				idDocumentMatch
Subscriber name	subscriber_name_match	name_score	identity_response	name			nameMatch
(name reading)	subscriber_name_kana_hankaku_match						nameKanaHankakuMatch
(name reading)	subscriber_name_kana_zenkaku_match						nameKanaZenkakuMatch
(given name)		given_name_score	(included in identity)	given_name			givenNameMatch
(family name)		family_name_score	(included in identity)	family_name			familyNameMatch
Subsscriber Postal Code	subscriber_postal_code_match	postalCode_score	(included in address)	postal_code			postalCodeMatch
Subscriber Address	subscriber_formatted_match		address_response	address			addressMatch
(street name)		street_name_score	(included in address)	house_or_housename			steetNameMatch
(street number)			(included in address)				streetNumberMatch
Subscriber Address-Region	subscriber_region_match						regionMatch
Subscriber Address-Town		locality_score		locality			localityMatch
Subscriber Address-Country		country_score		country			countryMatch
Subscriber Birthdate	subscriber_birthdate_match	birthdate_score	birthdate_response	birthdate			birthdateMatch
Subscriber Email Address		email_score					emailMatch
Subscriber name (Initial of the first Given Name)					(Initial of the first Given Name)		firstGivenNameMatch
(All initials of Given Names)					(All initials of Given Names)		allGivenNamesInitialsMatch
(The first Given Name)					(The first Given Name)		firstGivenNameMatch
(All Given Names)					(All Given Names)		allGivenNamesMatch
(Prefixes of the Current Family Name)					(Prefixes of the Current Family Name)		currentFamilyNamePrefixesMatch
(Family Name at birth)					(Family Name at birth)		familyNameAtBirthMatch
Subscriber Address (House Number Extension)					(House Number Extension)		houseNumberExtensionMatch
Subscriber Gender	subscriber_gender_match						genderMatch
~~User Name~~	~~user_name_match~~					~~userNameMatch~~
~~(user name reading)~~	~~user_name_kana_hankaku_match~~					~~userNameKanaHankakuMatch~~
~~(user name reading)~~	~~user_name_kana_zenkaku_match~~					~~userNameKanaZenkakuMatch~~
~~User Birthdate~~	~~user_birthdate_match~~						~~userBirthdateMatch~~

Thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi all, Toshi again.

I would also like to ask the team if the number of the proposed parameters/attributes is too many or not for the YAML definition. I mean there are some country/market specific attributes already, and these kinds of country/market specific attributes may be expanding in future. Is there any good way (technically) to handle these kinds of country/market specific attributes?

For example, these attributes are categolised as Extended attributes, and these attributes are added 'extended' before attribute names, and any attributes starting with 'extended' are regarded as country/market specific attributes, and they don't need to be included / listed in the YAML definition, but they can be used flexibly for specific countries/markets.

Perhaps, 'polymorphism' and 'schemas inheriting' Gilles pointed out could work for this matter?

I don't think we have to solve this matter for our initial version, though.

Thanks, Toshi

GillesInnov35 commented 12 months ago

Hi @ToshiWakayama-KDDI , I've a question on partner information (cp_id, service_id) I see in the attributes' list. In 3-Legged or 2-Legged authentication consumer information (partner id) are commonly transmitted in OAuth token. Could you explain why do you think it should be part of definition. thanks a lot

HuubAppelboom commented 12 months ago

Hi @ToshiWakayama-KDDI , I have some suggestion for your proposal, to see whether it is possible to simplify the list. Regarding 2nd or 3rd or 4th Given Names, it may be better to introduce an attribute of Middle Name(s) in stead. The Given Name is then always a single name, and the Middle Name(s) are then 2nd 3rd, 4th etc Given Name. This is especially imprtant because people not always leave all their given names (usually one ar all).

For the cases where only initials are available, we would use only the initals of the Given Name and the Initials of the Middle Names.

Prefixes is something we can omit from the matching process, as long as it is defined that prefixes are always omitted from the Family Name. For a given area / country, we can define lists of what commonly used prefixes are (for the Netherlands such a list is already available).

If we do this, the list for a compromise can become somewhat shorter:

phoneNumberMatch
mainPhoneNumberMatch
idDocumentMatch
nameMatch
- nameKanaHankakuMatch
- nameKanaZenkakuMatch
givenNameMatch
middleNamesMatch
familyNameMatch
postalCodeMatch
addressMatch
streetNameMatch
streetNumberMatch
regionMatch
localityMatch
countryMatch
birthdateMatch
emailMatch
givenNameInitialMatch
middleNamesInitialsMatch
familyNameAtBirthMatch
houseNumberExtensionMatch
genderMatch

with kind regards Huub

javier-carrocalabor commented 12 months ago

Hi, Thank you all for the contributions to the debate.

I really think that the list in https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1840159874 is too long. The idea is that too many parameters lead the API clients to have unclear expectations about what is and what is not implemented.

I agree with @ToshiWakayama-KDDI about delaying for future versions parameters that are complex or not clear https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835634611

Agree with @GillesInnov35 about getting inspiration from TMF 632 https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835672991 and with @HuubAppelboom from EIDAS2 https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1837259897 I have found this reference for your consideration: https://github.com/eu-digital-identity-wallet/eudi-doc-architecture-and-reference-framework/blob/main/docs/arf.md#5111-pid-attributes-for-natural-persons

So, trying to follow these ideas and trying to think in a Global solution according to https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1838719726, our proposal is to shorten the list of parameters as possible, as long as they have enough semantic for the current requirements. In this sense, this is the example with which we would feel comfortable:

phoneNumber: phone number also identified in the 3-legged access token.
idDocument: official id card number.
firstName: it can be the first name or given name as is, or a compound first name or second/middle name, or even initials for them. The operator can perform the matching against the corresponding information available.
familyName: it can surname or family name, and can include a compount last name or additional last name, or initials for them.
streetName: the name of the street.
streetNumberOrName: a number or name that identifies the property of the user in the referred streetName. It can be just a number, a name of a house, or an alphanumeric code identifying the house.
postalCode: ZIP or postal code of the address.
birthdate

Having said that, I think that, in any case, too many parameters in a plain list may lead the API clients to confusion about what can be used in each country, or in each operator, and what is really implemented in each one of those cases. If clients really need so many options, perhaps Gilles is right in his comment (https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1838719726) and we need to exploit the potential of Inheritance/Polymorphism. I have found this useful reference about this: https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/ In this way, if needed, perhaps we could separate sets of parameters and specify when and where each set applies. I don't think the CAMARA guidelines (https://github.com/camaraproject/Commonalities/blob/main/documentation/API-design-guidelines.md) say anything about this. So, I think we are pushing the current limits of the CAMARA guidelines. But let me insist that a plain list of too many parameters around the same concept leads the clients to confusion, and get them lost in many options without certainty about what they will get or won't get when making a call to the API.

Last thing is about particles, symbols, etc referred in https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1837259609 Regardless previous considerations, in order to maximize the matching results, I think we could consider the operator to apply some kind of normalization of the contents of parameters of the request before matching them with the internal information. For example, general rules like lower-casing the characters, removing spaces, dots, hyphens, etc. and even the usual "stop words", will immediately improve the matching results even though we can apply matching scores in next versions.

HuubAppelboom commented 12 months ago

Regarding what will be used in the eIDAS2 wallet, with ARF version 1.2, there will be a detailed PID Rule Book published, which will be of interest. ARF 1.2 is unfortunately not published yet, but is expected soon.

HuubAppelboom commented 12 months ago

Regarding the matching process, what we in the Netherlands do is also normalize special characters which are not very commonly used in our area, also because there special characters are often not supported by the CRM systems. Also, for example in the german language there are specific mappings for special characters used. What may be best to define these as part of instructions on how to normalize in a specific country or language area. If both parties apply these rules, you can get as a reward a much higher matching rate; if either party does not, you will get a lower matching rate. It will be very difficult to set rules for this on a global scale, that's why we propose to do this per area (perhaps per country code would be a good thing).

HuubAppelboom commented 12 months ago

Regarding idCardnumber: in most markets you can have several ID's (for example we have driving license. passport, ID card). In order to make sense out of the matching result, you should communicate back what kind of ID has been matched against. One issue with these idCardnumber, is that as soon as you renew an ID, the number changes, so I doubt whether you will in practice a high match rate.

HuubAppelboom commented 12 months ago

In general, I am not too worried about the attribute list being a bit long, but more worried about trying to put too many flavours in a single attribute. For example, we tried working with all initials available for the given name, but which resulted in a too low match rate, simply because either side (MNO or Relying party) did not have all initials at their disposal. Same will be the case if you this with given names, or for example an attribute with all the address details in it. The more you try to push things in a single match result, the higher the chance of a mismatch, and that is why we propose to split 1st given name from middle names, streetname from street number, street number extension from street number etc.

GillesInnov35 commented 12 months ago

hello all, that's good this is a very interesting, we are converging to a solution.

@HuubAppelboom could you complete your proposition with some examples of atributes' value in order to see what kind of information is waited. I don't see clearly how and middleNamesInitialsMatch and middleNamesMatch will be valued (type array or single). Thanks a lot Concerning idDocument if we should to keep it, I think a structure individualIdentification: {name, value} might be used For example [{"national ID card", "124587652"}]. The objective is to be as clear as possible of what refers the id to.

Regards

ToshiWakayama-KDDI commented 12 months ago

Hi @Javier, Hi Huub, Hi Gilles,

Thank you for your further comments. I have the same view with Huub that I am not worried about the length of the currently proposed attribute list (mine and Huub's). So, Huub's proposed list (plus cp_id/service_id) would be pretty much fine with me.

I can understand the view of making the attribute list as short and simple as possible, however, currently proposed attributes are required by operators and their customers, so, I think there is no point deleting required attributes in order to make the list simple. (For example, we are providing Matching for the single 'name' attribute and the single/formatted 'address' attribute which our customers need.)

For the API clients, they can use attributes they need and can just ignore attributes they do not need. To avoid their confusion, we can prepare proper description and explanation for each API and further we could prepare some typical examples of attributes set for some typical use cases.

For the operators, they can just ignore requests for attributes they do not have.

So, it is kind of 'the greater embarces the less', and I don't believe Huub's proposed list (plus cp_id/service_id) is too long. Could we accept it for our first version?

Thanks, Toshi

HuubAppelboom commented 12 months ago

Regarding the middleNames attribute, there is two way we can do this, in case there is more than one middle name.

Take for example: Robertus Mattheus Franciscus Janssen in this, Robertus is the given name (always the first one) Mattheus Franciscus are the middle names Janssen is the familiy name

For Mattheus Franciscus, we could either choose to make it one long string, with everything lowercase, without spaces etc., and hash the result. So in the end you will recieve a hash of "mattheusfranciscus"

The alternative would be to make it a list of middle names, and make a hash of each middle name separately (after making everything lowercase). So then you receive a list of two hashes (of "mattheus" and "franciscus"), and for each hash you will provide a Y/N whether you also have that in your list. (in this I assume the order of the middle names is not that relevant).

Probably the alternative will give a higher match rate, in case only one of the middle names mismatches you still have a partial match. What do you think ?

HuubAppelboom commented 12 months ago

Adding which type of ID document may be a good idea, but I know this can also become a bit complex a long list. For example, in the Netherlands we also have special ID documents like a permit for fugitives, an id card for embassy staff, etc, etc. Can we agree on a short list of the most common types, and one category Other ? For example Passport, Driving License, IDCard and Other ??

camaraproject / KnowYourCustomer

KYC Match - Compare specifications #18

CAMARA KYC Match - Specifications