camaraproject / KnowYourCustomer

Repository to describe, develop, document and test the KnowYourCustomer API family
Apache License 2.0
10 stars 11 forks source link

KYC Match - Compare specifications #18

Closed GillesInnov35 closed 5 months ago

GillesInnov35 commented 1 year ago

CAMARA KYC Match - Specifications


Bellow a proposal of comparison matrix between different offers' specifications and CAMARA initial requirements proposal. Key points :

  1. List of KYC attributes requirements must be reviewed, propose a short list to target a first version based on existing proposals
  2. Match result response must be defined. A match score (matching percentage) or a match result value such as what GSMA returns (“Y”– match is successful, “N-NA” - match failed, data is not available, “N-AV” – match failed; data is available, “N-AD” – match failed, data is available, but access is denied)

Request Specifications

CAMARA KYC Match requirements GSMA KYC Match KDDI KYC Match Orange KYC Match Proposal
msisdn phone_number subscriber_phone_number_match msisdn phoneNumber
name name user_name_match name name
given name given_name given_name givenName
family name family_name family_name familyName
address subscriber_formatted_match address
street name house_or_housename street_name streetName
region subscriber_region_match
postal code postal_code subscriber_postal_code_match postalCode
town locality locality locality
country country country country
birthdate birthdate subscriber_birthdate_match birhdate birthdate
email address email email



Response Specifications

CAMARA KYC Match requirements GSMA KYC Match KDDI KYC Match Orange KYC Match Proposal
msisdn phone_number subscriber_phone_number_match msisdn
name name user_name_match nam_score
given name given_name given_name_score
family name family_name family_name_score
address subscriber_formatted_match
street name house_or_housename street_name_score
region subscriber_region_match
postal code postal_code subscriber_postal_code_match postalCode_score
town locality locality_score
country country country_score
birthdate birthdate subscriber_birthdate_match birhdate_score
email address email_score
StefanoFalsetto-CKHIOD commented 1 year ago

Hi Gilles, I have some feedbacks: Request Specifications

  1. Why do you wants to change the name from MSISDN to phoneNumber? The word MSISDN is directly referring to the standard way of representing a phone number.
  2. I would like to avoid the use of "address" attribute. This aggregated field is not only depending on country rules but also on internal MNO BSS implementation. Since we are rebooting from scratch this service, we can leverage on previous experience and ask to MNOs to export the single components of the address, such as: street_name --> the name of the street where the end customer resides. Just the street name, nothing else (i.e., no house number, no zip/postal code, etc.) town province region house_number --> the number of the building where the end customer resides postal_code

We can still include the "address" attribute but discourage in some way the use of it.

Response Specifications Since the answer will be Y, N-NA, N-AV, N-AD the term "score" could be misleading. I am still thinking to a valid alternative to propose, but I can't figure it out now.

ToshiWakayama-KDDI commented 1 year ago

Thanks very much, @GillesInnov35, for creating this issue.

May I ask questions for clarification?

I have one comment: we have agreed that calculating matching score is for our future releases, so, it should not be included for our initial release.

Thanks.

StefanoFalsetto-CKHIOD commented 1 year ago

Hi @ToshiWakayama-KDDI we agreed to not include the matching score. But we also agreed that the score is something we need to "take into consideration in some way" since we will work on it as soon as the first version of those specifications is released. Hence, I think it’s important to do now something to enable future improvements.

@GillesInnov35, I figured out my proposal: In order to find a "middle way" between future developments and Toshi pressure for next-to-come first milestone, we can still use "_match" suffix on response attributes. In that way we can address our future discussions on modifying just the "Y" response. Maybe could it be "Y-nn" where nn is the score? Let's keep the proposals for future discussions.

GillesInnov35 commented 1 year ago

Hi @StefanoFalsetto-CKHIOD , @ToshiWakayama-KDDI Thanks a lot for your comments. I'll try to explain the proposition Phone number rather than msisdn

use of address

GSMA Mobile Connect KYC Match

fernandopradocabrillo commented 1 year ago

Hi @GillesInnov35 I think that this table is lacking Telefonica's proposal too and some of our fields (like idDocument) and vision for the properties. Can you please update accordingly? Thanks!

Regarding use of address In our proposal, the address field is composed of the different parts it can have. We consider that having a single field in which the postal address can be included in such a generic way adds complexity and, as @StefanoFalsetto-CKHIOD said, is very country-dependent. So we support having different fields for its representation.

GillesInnov35 commented 1 year ago

Hi @fernandopradocabrillo , yes sure

could you send me the list of atributes Telefonica proposes in its solution. thanks

javier-carrocalabor commented 1 year ago

Hi, @GillesInnov35, here the Telefonica's proposal mentioned by @fernandopradocabrillo : https://github.com/camaraproject/KnowYourCustomer/blob/f153a4799213fc4b0474d156c7b10b490015439e/code/API_definitions/kyc-match.yaml#L143 which can be summarized in: phoneNumber, idDocument, identity (composed of firstName and lastName), address (composed of postalCode, streetName and streetNumber), and birthdate. And the responses would be xxxx_response for each of them.

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Please find the revised shortlist table below. I have added our proposed parameters/attributes, which are included in our YAML file, to the shortlist table. Also I have added Telefonica's parameters/attributes as well. Hope it is correct. Also I have changed GSMA Match to MobileConnect Match and moved it to the right as MobileConnect is not our proposal.

I have one point to ask you at the moment: Our company differentiates Subscriber (who makes contract with us) and User (who actually uses the phone). For example, Pararent is Subscriber and their child is User. Do you have the same kind of differenitation?

Match Request Body

CAMARA KYC Match requirements/categories KDDI KYC Match Orange KYC Match Telefonica KYC Match GSMA KYC Match Orange Proposal
Phone Number subscriber_phone_number_match msisdn phoneNumber phone_number phoneNumber
(special phone number) main_subscriber_phone_number_match
ID Document idDocument
Subscriber name user_name_match name identity (composed of firstName and lastName) name name
(name reading) subscriber_name_kana_hankaku_match
(name reading) subscriber_name_kana_zenkaku_match
(given name) given_name (included in identity) given_name givenName
(family name) family_name (included in identity) family_name familyName
Subsscriber Postal Code subscriber_postal_code_match postalCode (included in address) postal_code
Subscriber Address subscriber_formatted_match address (composed of postalCode, streetName and streetNumber) address address
(street name) street_name (included in address) house_or_housename streetName
(street number) (included in address)
Subscriber Address-Region subscriber_region_match
Subscriber Address-Town locality locality locality
Subscriber Address-Country country country country
Subscriber Birthdate subscriber_birthdate_match birthdate birthdate birthdate birthdate
Subscriber Email Address email email
User Name user_name_match
(user name reading) user_name_kana_hankaku_match
(user name reading) user_name_kana_zenkaku_match
User Birthdate user_birthdate_match
3rd party ID cp_id
service_id



KYC Match Response

CAMARA KYC Match requirements/categories KDDI KYC Match Orange KYC Match Telefonica KYC Match GSMA KYC Match Proposal
Phone Number subscriber_phone_number_match msisdn phoneNumber_response phone_number
(special phone number) main_subscriber_phone_number_match
ID Document idDocument_response
Subscriber name subscriber_name_match name_score identity_response name
(name reading) subscriber_name_kana_hankaku_match
(name reading) subscriber_name_kana_zenkaku_match
(given name) given_name_score (included in identity) given_name
(family name) family_name_score (included in identity) family_name
Subsscriber Postal Code subscriber_postal_code_match postalCode_score (included in address) postal_code
Subscriber Address subscriber_formatted_match address_response address
(street name) street_name_score (included in address) house_or_housename
(street number) (included in address)
Subscriber Address-Region subscriber_region_match
Subscriber Address-Town locality_score locality
Subscriber Address-Country country_score country
Subscriber Birthdate subscriber_birthdate_match birthdate_score birthdate_response birthdate
Subscriber Email Address email_score
User Name user_name_match
(user name reading) user_name_kana_hankaku_match
(user name reading) user_name_kana_zenkaku_match
User Birthdate user_birthdate_match

Many thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Also please find the below a short list table for KYC Fill-in attributes/parameters based on our Fill-in YAML.

Any comments would be welcome.

Fill-in Request Body

CAMARA KYC Fill-in requirements/categories KDDI KYC Fill-in No other Fill-in proposals Proposal
3rd party ID cp_id



Fill Response

CAMARA KYC Fill-in requirements/categories KDDI KYC Fill-in No other Fill-in proposals Proposal
Phone Number subscriber_mobile_phone
Subscriber name subscriber_name
(family name) subscriber_name_family
(given name) subscriber_name_first
(name reading) subscriber_name_kana_hankaku
(family name reading) subscriber_name_kana_hankaku_family
(given name reading) subscriber_name_kana_hankaku_first
(name reading) subscriber_name_kana_zenkakuku
(family name reading) subscriber_name_kana_zenkaku_family
(given name reading) subscriber_name_kana_zenkaku
Subsscriber Postal Code subscriber_postal_code
Subscriber Address subscriber_formatted
Subscriber Address-Region subscriber_region
Subscriber Birthdate subscriber_birthdate
Subscriber Gender subscriber_gender
Subscriber Email Address subscriber_mail_address
User Name user_name
(user family name) user_name_family
(user given name) user_name_first
(name reading) user_name_kana_hankaku
(family name reading) user_name_kana_hankaku_family
(given name reading) user_name_kana_hankaku_first
(name reading) user_name_kana_zenkakuku
(family name reading) user_name_kana_zenkaku_family
(given name reading) user_name_kana_zenkaku
User Birthdate user_birthdate

Many thanks, Toshi

GillesInnov35 commented 1 year ago

@ToshiWakayama-KDDI, Orange KYC offers differentiate also subscriber and user. The 3-Legged authentication architecture is based on user information who authenticates and should consent. But information returned by the service concern the subscriber who signed the contract. @fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

GillesInnov35 commented 1 year ago

I have a question regarding Toshi's proposition included language information (user_name and user_name_kana_hankaku). Does it mean we should introduce a dataType attribute valued with InternationUserClass, JapaneseUserClass, etc. This kind of information to type the data has been for example included in DeviceLocation API definition. Gilles

ToshiWakayama-KDDI commented 1 year ago

@ToshiWakayama-KDDI, Orange KYC offers differentiate also subscriber and user. The 3-Legged authentication architecture is based on user information who authenticates and should consent. But information returned by the service concern the subscriber who signed the contract. @fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

Hi @GillesInnov35 , Thanks. Just to double check, address, name, email etc. that are currently proposed by Orange are all for Subscribers???

Thanks.

ToshiWakayama-KDDI commented 1 year ago

I have a question regarding Toshi's proposition included language information (user_name and user_name_kana_hankaku). Does it mean we should introduce a dataType attribute valued with InternationUserClass, JapaneseUserClass, etc. This kind of information to type the data has been for example included in DeviceLocation API definition. Gilles

Hi @GillesInnov35 , Thanks for the information! I have just looked at DeviceLocation YAMLs, but I could not find it (dataType). Could you advise me which YAML has it (dataType)?

Thanks

GillesInnov35 commented 1 year ago

Hi @ToshiWakayama-KDDI ,

ToshiWakayama-KDDI commented 1 year ago

Hi @ToshiWakayama-KDDI ,

  • yes, information returned or compared by the Orange Match ID API concern only subscriber's information.
  • in the DeviceLocation API deifnition the attribute which specifiy the type of the class is areaType (circle or polygon).

Hi Gills @GillesInnov35 , Thank you very much. I will look into it quickly, together with my internal colleagues.

fernandopradocabrillo commented 1 year ago

@fernandopradocabrillo, idDocument is part of TF API Match design. The type of concerned document is never mentioned ?

Hi @GillesInnov35 , That's correct, we decided not to include the idDocument type in the proposal since it added unnecessary complexity. In the end we want to check if the idDocument provided matches the one stored by the MNO, the important thing here is the number itself.

@ToshiWakayama-KDDI From our side, we also do the match against subscriber's information only.

ToshiWakayama-KDDI commented 1 year ago

Hi @fernandopradocabrillo , Thank you.

Hi @GillesInnov35 , I have quickly checked 'areaType' in location-retrieval API and location-verification API, but I am not immediately quite sure if we could introduce UserClass attributes in the similar way for our purpose. Anyway, at the mobment, we do not consider introducing new attributes like UserClass, as it is better for us to keep our first version simple with only required attributes.

Many thanks,

GillesInnov35 commented 1 year ago

Thanks a lot @ToshiWakayama-KDDI As we are currently discussing about what attributes should be mandatory my question was: Do specific attributes kana should be part of KYC-Match request definition ?

HuubAppelboom commented 1 year ago

In the Netherlands, we currently have the following attribute list in use:

We don't use street name, town etc because in the Netherlands postal code + house number + house number extension is very exact already.

We already have relatively high match rates (up to 80% for family name). Nevertheless, I think we can still improve by the following: In stead of Given Name Initials, use the following attributes in parallel:

Often people only record their first Given Name or Initial (although many have multiple Given Names). The use of initials can help for cases where there are multiple ways how to write a given name (for example Steve and Stephen).

In the Netherlands we have a list of prefixes that we usually strip from the family name. The reason we do this is that the prefixes can be abbreviated, which hinders the matching. What we can add is an extra attribute in which you compare these prefixes.

For Family Name, I think we can improve by adding the Family Name at birth as a separate attribute. In the Netherlands, your familiy name can change when you get married, so this may change during your life time. Your Family Name at birth never changes, and when available, it is better for matching because it stays constant.

Streetname we do not use, because our postal code + house number + housenumber extension is very exact.

So, we would propose the following list (for NL):

HuubAppelboom commented 1 year ago

Annex B - MC Product Specification - Match, v1.4.xlsx

Attached also the list of specs we currently use for Match in NL. It also includes the list of prefixes we strip from family name

GillesInnov35 commented 1 year ago

Thanks @HuubAppelboom I think we should be able to identify a short list of common attributes to all designs and propose a first draft.

javier-carrocalabor commented 1 year ago

I agree with @GillesInnov35 in the sense that I think we should see it from the perspective of a Service Provider that is asking a user for some contact information, and shows a form to collect several fields of data. Then, IMHO, and recongizing I don't know the habits in the Netherlands, I don't think the Service Provider is going to ask the user for, for example, all potential ways of expressing their name, but will ask for the most common way to express the name in that country.

HuubAppelboom commented 1 year ago

I agree with @GillesInnov35 in the sense that I think we should see it from the perspective of a Service Provider that is asking a user for some contact information, and shows a form to collect several fields of data. Then, IMHO, and recongizing I don't know the habits in the Netherlands, I don't think the Service Provider is going to ask the user for, for example, all potential ways of expressing their name, but will ask for the most common way to express the name in that country.

The issue is not that we think that Service Providers should ask end users for all different possible variations that you can have, but that MNO's and Service Providers have a history and way of working in collecting the data. For example, in the Netherlands we have a couple of MNO's which only have collected initials. Making Given Name(s) the only option will not work in this case (that's why we have chosen for initials-only in the Netherlands, deviating from the Mobile Connect standard).

The other issue you have is when you ask for matching all initials (or given names), and provide that as the only option, you will see that often 2nd and rd initials are missing in current databases (at least we have seen that), which results in a lower match rate than you could have. That's why we propose to make several attribute fields available in the standard, and that you match on all field that you have available. The same principle would apply for family name, if you have the family name at birth also available, that you can aso provide a match on this. In the end , you can safely get to a higher overall match rate through this, without the need to go to more complex solutions like a match score based on whether the attributes are similar.

As far as the availability of data is concerned, in case the MNO does not have a specific attribute in their CRM system, you can always answer with "NA".

ToshiWakayama-KDDI commented 1 year ago

Thanks a lot @ToshiWakayama-KDDI As we are currently discussing about what attributes should be mandatory my question was: Do specific attributes kana should be part of KYC-Match request definition ?

  • If Yes, dataType used a discriminator would be useful to avoid duplication of concerned attributes
  • if no, there's no need to differentiate 2 schemas

Hi @GillesInnov35 ,

Thanks very mucy. First of all, my understanding is that we are not discussing mandatory attributes, but that all attributes should be optional, as I shared on Tuesday. Surely we need mandatory requirement like 'at least one attribute should be included in a API match request'.

So, to answer your question, we would like to have specific attributes kana etc. part of KYC-Match request definiton, as one of the options.

Then I understand your point that dataType used as a discriminator would be useful to avoid duplication of concerned attributes, and I think I need to look into it.

Many thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi @HuubAppelboom , @javier-carrocalabor , @GillesInnov35 ,

Thank you, all, for your comments. Now I understand the Netherlands needs some spedific attributes. As I shared on Tuesday, I would propose to include all the required attributes, both of commonly used attributes and country/market specific attributes, if we categorise, in our first version. I think that all the attributes should be Optional, as it seems there are many ways to use this API/KYC-Match functionality so it is difficult to identify mandatory ones. Of course, we need some mandatory requirement like 'there should be at least one attribute incuded in a API request'.

If you think we may need categorisation of Common attributes and Country/Market specific attributes, we could write it down somewhere in YAML or in API documentation.

What do you think?

Many thanks, Toshi

HuubAppelboom commented 1 year ago

Hi @ToshiWakayama-KDDI,

I would indeed support to include all attributes, and include both commonly used and country/market specific attributes. As a rule, I would suggest that when you can, you support all attributes for which you have data for.

For example, for NL we currently do not support streetname (because it is not necessary here), but for the sake of international compatibility we will implement it.

On the customer side, the customer can always choose which attributes will be asked to be matched (with the minimum of one of course). For example, for some cases we only need address verification and nothing else, because the customer is already using a different source for the name, date of birth, email etc.

What should also be prevented is that customers start offering data in case they don't have it, because this will give you wrong match rate statistics. For example, we had one customer that did not have Date of Birth data, so in stead they always submitted "YYYY-MM-DD" as a hashed string, which ofcourse never matches, or a dummy date like "1900-01-01". You will get low match rates, and it really take some time to find out what is going wrong. So in any case, customers must always submit valid data, and not dummy data.

With kind regards Huub

GillesInnov35 commented 1 year ago

Hi @ToshiWakayama-KDDI , term mandatory was not appropriate because as you say all attributes should be optional of course (except phone number). I was meaning attibutes we'd like to see in the API design (will be common attributes). Thanks a lot

StefanoFalsetto-CKHIOD commented 1 year ago

As I said in some other comments, I will be happy to discuss about deprecating the address attribute. It is far better (for many countries around the world) to use different attributes for the single address components.

StefanoFalsetto-CKHIOD commented 1 year ago

In order to find the right initial set of attributes, I am sharing the full set of attributes that CKH (and hence all the affiliates operators) are offering to its Partners. As you can see we are supporting all the attributes defined into the GSMA IDY.28 specifications plus some custom ones (e.g., the age verification). Some of the address related attributes such as houseno_or_housename_hash are used for historical reasons, but will be deprecated in future. Moreover, some of the custom attributes are calculated on the fly by managing atomic data obtained from MNOs (e.g., age and age_is_greater_than are calculated using the birthdate).

Requested Attribute Returned value
account_state Active/Inactive
age_hash True/False
age_is_greater_than True/False
address_line1_hash Y/N-NA/N-AV
address_line2_hash Y/N-NA/N-AV
billing_segment PAYM/PAYG
birthdate_hash Y/N-NA/N-AV
city_or_province_hash Y/N-NA/N-AV
country_hash Y/N-NA/N-AV
email_hash Y/N-NA/N-AV
family_name_hash Y/N-NA/N-AV
flat_number_hash Y/N-NA/N-AV
gender_hash Y/N-NA/N-AV
given_name_hash Y/N-NA/N-AV
house_name_hash Y/N-NA/N-AV
house_number_hash Y/N-NA/N-AV
houseno_or_housename_hash Y/N-NA/N-AV
is_adult True/False
is_age_verified True/False
is_email_verified True/False
is_lost_stolen True/False
middle_name_hash Y/N-NA/N-AV
postal_code_hash Y/N-NA/N-AV
title_hash Y/N-NA/N-AV
town_hash Y/N-NA/N-AV
ToshiWakayama-KDDI commented 1 year ago

As I said in some other comments, I will be happy to discuss about deprecating the address attribute. It is far better (for many countries around the world) to use different attributes for the single address components.

Hi @StefanoFalsetto-CKHIOD ,

Thank you for the comment, but I think the address attribute is required. As you pointed out in your previous comment, the aggregated field is depending on country rules, which I think is true, and in some countries like Japan Customers need the aggregated address field, mainly because it is difficult to split our address into separete fields.

I think both of the aggregated address field and split address fields can exist as optional fields. If a MNO does not support a specific attribute and the MNO is asked about the specific attribute, it can answer with Not_Available or something. It may be better to share what attributes are supporeted by a MNO and which are not, but this would be a Business matter or could be our future topic.

Thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi @GillesInnov35 , @fernandopradocabrillo , @javier-carrocalabor , @HuubAppelboom , @StefanoFalsetto-CKHIOD ,

Thank you for your comments. I feel our discussion is spreading and exploding (sorry I don't know the proper word) and we have to start converging our discussion, considering our target time.

I have some suggestion for converging our discussion as below:

  1. Regarding Age attributes, it needs some calculation and also it is related to the new API 'Age Verification', so, I would suggest to delay it for future enhancement.

  2. Regarding attributes requiring caluculation or processing, I would suggest to delay them for future enhancement. We have agreed to delay Match Scoring for future enhanement, and Hashing and Age are the same. (Because solution discussion is needed and it would take time.)

  3. Regarding attributes not related to subscribers/users, e.g. account active/inactive, I would suggest to delay them for future enhancement. (Because we need to discuss it is required or not, as it is unclear whether it is KYC information.)

  4. Regarding any attributes requiring complex and deep discussion, I would suggest to delay them for future enhancement. (Because of our short time.)

Any views?

Considering No.4 above, we can agree to delay User information attributes (separete from Subscriber/Contractor information) for future enhancement.

Thanks, Toshi

GillesInnov35 commented 1 year ago

Hi, I agree with @ToshiWakayama-KDDI proposition to target to a limited list of attributes in this first version even if it does not cover the full scope of existing offers. If we have a look at what proposes TMForum (which is a main standard) for a party/individual resource, the list of attributes which define a person is limited to few fields. It means that such a list already exists in others specifications. It could be a good example, right ?

for information, see bellow some of fields in TMF 632 party (individual) specifications

 "givenName": "Jane",
 "familyName": "Lamborgizzia",
 "legalName": "Smith",
 "middleName": "JL",
 "fullName": "Jane Smith ep Lamborgizzia",
 "formattedName": "Jane Smith ep Lamborgizzia",
 "birthDate": "1967-09-26T05:00:00.246Z",

  Geographic address
         "city": "Morristown",
         "country": "USA",
         "postCode": 7960,
         "stateOrProvince": "New Jersey",
         "street1": "240 Headquarters Plazza",
         "street2": "East Tower - 10th Floor"
  ContactMedium  
     "emailAddress": "jane.lamborgizzia@gmail.com"
     "phoneNumber": "+112785426565"

As formattedName exists for Name, a formattedAddress could be added for aggregation of fields of address .

javier-carrocalabor commented 1 year ago

I agree with @ToshiWakayama-KDDI points in the shake of simplification and, at the same time, to find a common base that can cover most of the needs. Particularly, I see @GillesInnov35 proposal (https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835672991) a good starting point to achieve this.

HuubAppelboom commented 1 year ago

In my experience, you will need a minimum list of attributes that is needed to properly identify a person.
Unfortunately this list of what is needed varies per country. So, in case you want to work cross border, you will always see a longer list than what is the minimum for a country.

In addition, for the case of a matching proces based on hashes, some attributes cause problems that make these less suitable. For example, in the TMF 632 list above "fullName": "Jane Smith ep Lamborgizzia" will give problems , because there is an abbrevation "ep" used which is probably language dependent. In the netherlands we have for example "ev" or "wv", or we use a "-" symbol, where it can be Smith-Lamborgizzia or Lamborgizzia-Smith. And other markets have their own habits, which are different. This is exactly the reason why we would like to have the family name at birth as an extra attribute. If the current fullName does not match, but the familiy name at birth matches, you still know who it is with sufficient precision.

I don't think it is wise to make the list of attributes as small as possible, because you will run the risk that it becomes too small to be of any use. And for markets where Match is alerady being used, it makes no sense to come with an API which is less effective.

What may be more pragmatic is to start in one or 2 countries, and define the API there, by defining the minimum what is needed in these markets (and have a better offering than the current Match product). And do a market by market introduction, and in each market add the attributes that are needed to have minimum set for that market as well. This way you will have a growing list of attributes over time.

HuubAppelboom commented 1 year ago

PS. In the current EIDAS2 wallet standardisation process in Europe there is also a PID being defined (a list of attributes), that may be worth to take a look at.

GillesInnov35 commented 1 year ago

Hi @HuubAppelboom, I understand your point of view regarding your experience hower to my opinion CAMARA approach is to think to a global solution which could be adopted by much operators and partners. If we think "country" from the start I'm not sure it will be the case. I don't clearly understand why we could not start with a limited list of attributes for which MNO should be able to compare information and return a match result, even if I agree with you that in some use cases the match result would not be so helpful depending on the expected trusted level. If we think code, polymorphism should help us to define new specific schemas inheriting from this first base and perhaps targeting specific countries's requirements. I don't know if my vision is clear enough. I'll also discuss about that internally with my colleagues. Thanks a lot

HuubAppelboom commented 1 year ago

Hi @GillesInnov35 , For example for the Netherlands I don't think any telco would start introducing a CAMARA version of KYC Match that is sigificantly less than what is available already available today.
Kind regards Huub

ToshiWakayama-KDDI commented 1 year ago

Hi all,

Based on our discussions, I have crated a compromised proposal by updating our initial proposed table (Gilles on 16th Nov and me on 20th Nov), as below. Paramters/attributes in the rightmost columns are my proposal.

Please note each of the proposed parameters/attributes has a Match suffix, but this is just my proposal and we have to discuss suffix for Request and Response separately, so, please check what parameters/attributes we need for our initial version.

I think we have to conclude our parameter/attribute discussion within this week, so any comments are welcome.

Match Request Body

CAMARA KYC Match requirements/categories KDDI KYC Match Orange KYC Match Telefonica KYC Match GSMA KYC Match Orange Proposal KPN Hutchison Compromised Proposal
Phone Number subscriber_phone_number_match msisdn phoneNumber phone_number phoneNumber phoneNumberMatch
(special phone number) main_subscriber_phone_number_match mainPhoneNumberMatch
ID Document idDocument idDocumentMatch
Subscriber name user_name_match name identity (composed of firstName and lastName) name name nameMatch
(name reading) subscriber_name_kana_hankaku_match nameKanaHankakuMatch
(name reading) subscriber_name_kana_zenkaku_match nameKanaZenkakuMatch
(given name) given_name (included in identity) given_name givenName givneNameMatch
(family name) family_name (included in identity) family_name familyName familyNameMatch
Subsscriber Postal Code subscriber_postal_code_match postalCode (included in address) postal_code postalCodeMatch
Subscriber Address subscriber_formatted_match address (composed of postalCode, streetName and streetNumber) address address addressMatch
(street name) street_name (included in address) house_or_housename streetName streetNameMatch
(street number) (included in address) streetNumberMatch
Subscriber Address-Region subscriber_region_match regionMatch
Subscriber Address-Town locality locality locality localityMatch
Subscriber Address-Country country country country countryMatch
Subscriber Birthdate subscriber_birthdate_match birthdate birthdate birthdate birthdate birthdateMatch
Subscriber Email Address email email emailMatch
Subscriber name
(Initial of the first Given Name)
(Initial of the first Given Name) firstGivenNameMatch
(All initials of Given Names) (All initials of Given Names) allGivenNamesInitialsMatch
(The first Given Name) (The first Given Name) firstGivenNameMatch
(All Given Names) (All Given Names) allGivenNamesMatch
(Prefixes of the Current Family Name) (Prefixes of the Current Family Name) currentFamilyNamePrefixesMatch
(Family Name at birth) (Family Name at birth) familyNameAtBirthMatch
Subscriber Address
(House Number Extension)
(House Number Extension) houseNumberExtensionMatch
Subscriber Gender subscriber_gender_match genderMatch
User Name user_name_match userNameMatch
(user name reading) user_name_kana_hankaku_match userNameKanaHankakuMatch
(user name reading) user_name_kana_zenkaku_match userNameKanaZenkakuMatch
User Birthdate user_birthdate_match userBirthdateMatch
3rd party ID cp_id cp_id
service_id service_id



KYC Match Response

CAMARA KYC Match requirements/categories KDDI KYC Match Orange KYC Match Telefonica KYC Match GSMA KYC Match KPN Hutchison Compromised Proposal
Phone Number subscriber_phone_number_match msisdn phoneNumber_response phone_number phoneNumberMatch
(special phone number) main_subscriber_phone_number_match mainPhoneNumberMatch
ID Document idDocument_response idDocumentMatch
Subscriber name subscriber_name_match name_score identity_response name nameMatch
(name reading) subscriber_name_kana_hankaku_match nameKanaHankakuMatch
(name reading) subscriber_name_kana_zenkaku_match nameKanaZenkakuMatch
(given name) given_name_score (included in identity) given_name givenNameMatch
(family name) family_name_score (included in identity) family_name familyNameMatch
Subsscriber Postal Code subscriber_postal_code_match postalCode_score (included in address) postal_code postalCodeMatch
Subscriber Address subscriber_formatted_match address_response address addressMatch
(street name) street_name_score (included in address) house_or_housename steetNameMatch
(street number) (included in address) streetNumberMatch
Subscriber Address-Region subscriber_region_match regionMatch
Subscriber Address-Town locality_score locality localityMatch
Subscriber Address-Country country_score country countryMatch
Subscriber Birthdate subscriber_birthdate_match birthdate_score birthdate_response birthdate birthdateMatch
Subscriber Email Address email_score emailMatch
Subscriber name
(Initial of the first Given Name)
(Initial of the first Given Name) firstGivenNameMatch
(All initials of Given Names) (All initials of Given Names) allGivenNamesInitialsMatch
(The first Given Name) (The first Given Name) firstGivenNameMatch
(All Given Names) (All Given Names) allGivenNamesMatch
(Prefixes of the Current Family Name) (Prefixes of the Current Family Name) currentFamilyNamePrefixesMatch
(Family Name at birth) (Family Name at birth) familyNameAtBirthMatch
Subscriber Address
(House Number Extension)
(House Number Extension) houseNumberExtensionMatch
Subscriber Gender subscriber_gender_match genderMatch
User Name user_name_match userNameMatch
(user name reading) user_name_kana_hankaku_match userNameKanaHankakuMatch
(user name reading) user_name_kana_zenkaku_match userNameKanaZenkakuMatch
User Birthdate user_birthdate_match userBirthdateMatch

Thanks, Toshi

ToshiWakayama-KDDI commented 1 year ago

Hi all, Toshi again.

I would also like to ask the team if the number of the proposed parameters/attributes is too many or not for the YAML definition. I mean there are some country/market specific attributes already, and these kinds of country/market specific attributes may be expanding in future. Is there any good way (technically) to handle these kinds of country/market specific attributes?

For example, these attributes are categolised as Extended attributes, and these attributes are added 'extended' before attribute names, and any attributes starting with 'extended' are regarded as country/market specific attributes, and they don't need to be included / listed in the YAML definition, but they can be used flexibly for specific countries/markets.

Perhaps, 'polymorphism' and 'schemas inheriting' Gilles pointed out could work for this matter?

I don't think we have to solve this matter for our initial version, though.

Thanks, Toshi

GillesInnov35 commented 12 months ago

Hi @ToshiWakayama-KDDI , I've a question on partner information (cp_id, service_id) I see in the attributes' list. In 3-Legged or 2-Legged authentication consumer information (partner id) are commonly transmitted in OAuth token. Could you explain why do you think it should be part of definition. thanks a lot

HuubAppelboom commented 12 months ago

Hi @ToshiWakayama-KDDI , I have some suggestion for your proposal, to see whether it is possible to simplify the list. Regarding 2nd or 3rd or 4th Given Names, it may be better to introduce an attribute of Middle Name(s) in stead. The Given Name is then always a single name, and the Middle Name(s) are then 2nd 3rd, 4th etc Given Name. This is especially imprtant because people not always leave all their given names (usually one ar all).

For the cases where only initials are available, we would use only the initals of the Given Name and the Initials of the Middle Names.

Prefixes is something we can omit from the matching process, as long as it is defined that prefixes are always omitted from the Family Name. For a given area / country, we can define lists of what commonly used prefixes are (for the Netherlands such a list is already available).

If we do this, the list for a compromise can become somewhat shorter:

with kind regards Huub

javier-carrocalabor commented 12 months ago

Hi, Thank you all for the contributions to the debate.

I really think that the list in https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1840159874 is too long. The idea is that too many parameters lead the API clients to have unclear expectations about what is and what is not implemented.

I agree with @ToshiWakayama-KDDI about delaying for future versions parameters that are complex or not clear https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835634611

Agree with @GillesInnov35 about getting inspiration from TMF 632 https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1835672991 and with @HuubAppelboom from EIDAS2 https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1837259897 I have found this reference for your consideration: https://github.com/eu-digital-identity-wallet/eudi-doc-architecture-and-reference-framework/blob/main/docs/arf.md#5111-pid-attributes-for-natural-persons

So, trying to follow these ideas and trying to think in a Global solution according to https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1838719726, our proposal is to shorten the list of parameters as possible, as long as they have enough semantic for the current requirements. In this sense, this is the example with which we would feel comfortable:

Having said that, I think that, in any case, too many parameters in a plain list may lead the API clients to confusion about what can be used in each country, or in each operator, and what is really implemented in each one of those cases. If clients really need so many options, perhaps Gilles is right in his comment (https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1838719726) and we need to exploit the potential of Inheritance/Polymorphism. I have found this useful reference about this: https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/ In this way, if needed, perhaps we could separate sets of parameters and specify when and where each set applies. I don't think the CAMARA guidelines (https://github.com/camaraproject/Commonalities/blob/main/documentation/API-design-guidelines.md) say anything about this. So, I think we are pushing the current limits of the CAMARA guidelines. But let me insist that a plain list of too many parameters around the same concept leads the clients to confusion, and get them lost in many options without certainty about what they will get or won't get when making a call to the API.

Last thing is about particles, symbols, etc referred in https://github.com/camaraproject/KnowYourCustomer/issues/18#issuecomment-1837259609 Regardless previous considerations, in order to maximize the matching results, I think we could consider the operator to apply some kind of normalization of the contents of parameters of the request before matching them with the internal information. For example, general rules like lower-casing the characters, removing spaces, dots, hyphens, etc. and even the usual "stop words", will immediately improve the matching results even though we can apply matching scores in next versions.

HuubAppelboom commented 12 months ago

Regarding what will be used in the eIDAS2 wallet, with ARF version 1.2, there will be a detailed PID Rule Book published, which will be of interest. ARF 1.2 is unfortunately not published yet, but is expected soon.

HuubAppelboom commented 12 months ago

Regarding the matching process, what we in the Netherlands do is also normalize special characters which are not very commonly used in our area, also because there special characters are often not supported by the CRM systems. Also, for example in the german language there are specific mappings for special characters used. What may be best to define these as part of instructions on how to normalize in a specific country or language area. If both parties apply these rules, you can get as a reward a much higher matching rate; if either party does not, you will get a lower matching rate. It will be very difficult to set rules for this on a global scale, that's why we propose to do this per area (perhaps per country code would be a good thing).

HuubAppelboom commented 12 months ago

Regarding idCardnumber: in most markets you can have several ID's (for example we have driving license. passport, ID card). In order to make sense out of the matching result, you should communicate back what kind of ID has been matched against. One issue with these idCardnumber, is that as soon as you renew an ID, the number changes, so I doubt whether you will in practice a high match rate.

HuubAppelboom commented 12 months ago

In general, I am not too worried about the attribute list being a bit long, but more worried about trying to put too many flavours in a single attribute. For example, we tried working with all initials available for the given name, but which resulted in a too low match rate, simply because either side (MNO or Relying party) did not have all initials at their disposal. Same will be the case if you this with given names, or for example an attribute with all the address details in it. The more you try to push things in a single match result, the higher the chance of a mismatch, and that is why we propose to split 1st given name from middle names, streetname from street number, street number extension from street number etc.

GillesInnov35 commented 12 months ago

hello all, that's good this is a very interesting, we are converging to a solution.

@HuubAppelboom could you complete your proposition with some examples of atributes' value in order to see what kind of information is waited. I don't see clearly how and middleNamesInitialsMatch and middleNamesMatch will be valued (type array or single). Thanks a lot Concerning idDocument if we should to keep it, I think a structure individualIdentification: {name, value} might be used For example [{"national ID card", "124587652"}]. The objective is to be as clear as possible of what refers the id to.

Regards

ToshiWakayama-KDDI commented 12 months ago

Hi @Javier, Hi Huub, Hi Gilles,

Thank you for your further comments. I have the same view with Huub that I am not worried about the length of the currently proposed attribute list (mine and Huub's). So, Huub's proposed list (plus cp_id/service_id) would be pretty much fine with me.

I can understand the view of making the attribute list as short and simple as possible, however, currently proposed attributes are required by operators and their customers, so, I think there is no point deleting required attributes in order to make the list simple. (For example, we are providing Matching for the single 'name' attribute and the single/formatted 'address' attribute which our customers need.)

For the API clients, they can use attributes they need and can just ignore attributes they do not need. To avoid their confusion, we can prepare proper description and explanation for each API and further we could prepare some typical examples of attributes set for some typical use cases.

For the operators, they can just ignore requests for attributes they do not have.

So, it is kind of 'the greater embarces the less', and I don't believe Huub's proposed list (plus cp_id/service_id) is too long. Could we accept it for our first version?

Thanks, Toshi

HuubAppelboom commented 12 months ago

Regarding the middleNames attribute, there is two way we can do this, in case there is more than one middle name.

Take for example: Robertus Mattheus Franciscus Janssen in this, Robertus is the given name (always the first one) Mattheus Franciscus are the middle names Janssen is the familiy name

For Mattheus Franciscus, we could either choose to make it one long string, with everything lowercase, without spaces etc., and hash the result. So in the end you will recieve a hash of "mattheusfranciscus"

The alternative would be to make it a list of middle names, and make a hash of each middle name separately (after making everything lowercase). So then you receive a list of two hashes (of "mattheus" and "franciscus"), and for each hash you will provide a Y/N whether you also have that in your list. (in this I assume the order of the middle names is not that relevant).

Probably the alternative will give a higher match rate, in case only one of the middle names mismatches you still have a partial match. What do you think ?

HuubAppelboom commented 12 months ago

Adding which type of ID document may be a good idea, but I know this can also become a bit complex a long list. For example, in the Netherlands we also have special ID documents like a permit for fugitives, an id card for embassy staff, etc, etc. Can we agree on a short list of the most common types, and one category Other ? For example Passport, Driving License, IDCard and Other ??