camaraproject / KnowYourCustomer

Repository to describe, develop, document and test the KnowYourCustomer API family
Apache License 2.0
7 stars 5 forks source link

Creation of a Pull Request for Age Verification API #46

Open GillesInnov35 opened 5 months ago

GillesInnov35 commented 5 months ago

To start the discussion

CAMARA Milestone: YAML files with supporting documentation for all drop 2&3 APIs 'included AGE Verification) completed by the end of March 2024 Thanks

GillesInnov35 commented 5 months ago

Bellow proposal: Age Verification API specifications for Service access restriction to adults to age-restricted sites.

Some functional restrictions may exist, for example:

Swagger view: image

Additional comments:

ToshiWakayama-KDDI commented 5 months ago

Hi @GillesInnov35 ,

I have just created PR #50 for this, with a kind of dummy yaml file. Hope you can update the yaml file or replace it with your initial proposal

Please let me know if you have any problem.

Best regards, Toshi

GillesInnov35 commented 5 months ago

Thanks a lot @ToshiWakayama-KDDI. I'll complete PR with a proposal of design specifications. Gilles

HuubAppelboom commented 4 months ago

Hi, in the Netherlands we only accept people of 18+ for a monthly subscription, this has to do with national law. You can make a contract with a minor, but such a contract can easily de cancelled by a parent without any financial consequence. As a results, all telco's don't accept minors as contract holders.

For the Netherlands , this means that any customer that can be positively matched (for example, based on first name, middles name and lastname, and phone number), you can already safely assume they are 18+, without asking for someone's date of birth.

So for 18+, we don't need to have an extra specification. We do however have some other age categories, like 24+.

For these categories, and for cases where the relying party does not want to ask for date of birth (for privacy reasons), it may be good to add an attribute whether te subject is over a certain age.

I suggest to make the parameter a bit more descriptive than "age" in above example, for example like

In the Request Body: { "phoneNumber :" +31612345678 "ageVerification": 24 }

In the Response:, if the user is indeed older, repeat the reqye { "phoneNumber :" +31612345678 "ageVerification": Y }

(Verification: Yes seems a bit ambiguous to me, especially if you combine it with other parameters). I would suggest to keep the answers the same as we do now with MC Match, ie. Y, N, N-NA.

HuubAppelboom commented 4 months ago

And as for the issue whether the end user is the same as the contract holder, the simplest way to solve this is by responding to an age verification request only when as a minumum alle th name attributes (First Name, Last Name etc) have a perfect match. If the name does not match sufficiently, it makes no sense to report an age verification.
I know this kind of verification is not 100% waterproof (you may have a parent and a minor with the same First Name), but this is probably the best you can do without overasking someone with additional attributes, like passportnumber, address when it is not needed etc etc

GillesInnov35 commented 4 months ago

hi @HuubAppelboom, thanks for your comment. In France we have the same restriction I've proposed something in line with your suggestion in the PR https://github.com/camaraproject/KnowYourCustomer/pull/50 with a draft yaml. BR Gilles

GillesInnov35 commented 4 months ago

to your comment @HuubAppelboom

And as for the issue whether the end user is the same as the contract holder, the simplest way to solve this is by responding to an age verification request only when as a minumum alle th name attributes (First Name, Last Name etc) have a perfect match. If the name does not match sufficiently, it makes no sense to report an age verification.

don't you think that as the phone number has been retreived we should not have to compare other informaiton.

HuubAppelboom commented 4 months ago

Hi @GillesInnov35 ,

In case a minor is using the phone and the contract owner is a parent, the minor will fill out the phone number, and that will verify as correct, so then the age verification would not work, because for the age you are checking the date of birth of the parent.

You need to test as a minimum that the user is the same individual as the contract owner, so by comparing the names it will show whether this is the case or not.

So, if you ask me, it makes sense to offer age verification only as part of the Match api, or to combine it at least with name verification. Otherwise you are assuming that the telco knows who the end user is, which is often not the case.

GillesInnov35 commented 4 months ago

hi @HuubAppelboom, one comment to

In case a minor is using the phone and the contract owner is a parent, the minor will fill out the phone number,

From what I know, in 3-Legged authentication (which is a requirement for KYC APIs), the phone number is retreived through the authentication journey and should not be filled out. BR Gilles

HuubAppelboom commented 4 months ago

@GillesInnov35 Correct, if you decide to use the front end flow. In that case you can retrieve the phone number from the authentication journey. For any backend flow (like CIBA) that will not work.

Anyway, the point is that it makes no sense to offer the Age Verification as a single attribute in the API without an additional identity check (for example based on a name match), because that will not work with most Telco's. Moreover, if you would provide its as such, it may lead to a confusing situation and wrong interpretation of the result.

GillesInnov35 commented 4 months ago

thanks a lot @HuubAppelboom, I totally agree with you that the more controls we put in, the more reliable the solution will be. My remark was just to clearly understand and not to limit attributes to a unique field of course. As you say, adding firstName and LastName match result is a first level of control. BR Gilles

HuubAppelboom commented 4 months ago

@GillesInnov35 An alternative control could also be the email address of the contract owner. If that matches, in mos cases you can also assume that the end user is the contract owner, and return the age verification result based on the age of the contract owner.

In any case, I would strongly recommend to make the use of at least one of these checks mandatory within the API request. Most developers don't read documentation, and if you provide the API without these, you will see lots of cases where the developer assumes the age check is done against the date of birth of the end user.

GillesInnov35 commented 3 months ago

Thanks @HuubAppelboom for your comment. @ToshiWakayama-KDDI , @fernandopradocabrillo, GSMA is asking for an update on this API by the end of the week. Thanks BR Gilles

GillesInnov35 commented 3 months ago

to all, find a draft of design in the PR https://github.com/camaraproject/KnowYourCustomer/pull/50 As proposed by @HuubAppelboom, I've added first name and family name in the attributes' request. Those 2 attributes should be compared with information held by MNO to confirm the identity of the user. BR Gilles

HuubAppelboom commented 3 months ago

@GillesInnov35 Hi Gilles, In the list of attributes there is also the middle name, can you add these as well to the proposal ?

GillesInnov35 commented 3 months ago

@HuubAppelboom , yes sure. I update the swagger design. Thanks BR Gilles

GillesInnov35 commented 3 months ago

Can we summarize the discussion like this ?

HuubAppelboom commented 3 months ago

@GillesInnov35 Hi Gilles, sounds like a good summary to me.

Regarding the tenure of the contract in case isVeried = False, how would that work and what would it proof ?

We don't have many contracts which exsist longer that 18 years, and the phone number may also be passed along between family members.... Moreover, we don't know what the relationship is between the contract owner and the user of the phone number, so we can't do much what that either. If the isVerified returns false, I simply would not do any age verification based on what the telco knows.

javier-carrocalabor commented 3 months ago

Hi all, I've been out of this discussion so far. This is becaming a priority for us too. Sorry for entering late. I have read it all your comments... The main problem we see is also that, usually, the known age is for the owner of the contract, and how to verify that the user at that moment is really the owner of the contract. It's true that a common practice in other services or even customer services given through phone, is to ask for some other contact details (even recording the conversation), but the first/family/middleName seems weak to me. In some countries the idDocument is also asked, which seems stronger. What do you think?

HuubAppelboom commented 3 months ago

@javier-carrocalabor I agree that the first / family / middleName is relatively weak. However I see this as a minimum verification that should be done.

In case you want a higher level of assurance, you can also ask for other identifiers, like idDocument, or email address, bank account number, your home address etc. For these cases (where you want more assurance), you can also advice customers t you use both the Age Verification api and the KYC Match API.

GillesInnov35 commented 3 months ago

hi @javier-carrocalabor , @HuubAppelboom , thanks for your contribution. I think also that if the channel partner or application developer want to propose a high level of identity control of the user it will have perhaps to subscribe to a bundle of API composed of KYC-Match KYC-AgeVerify. As mentionned by @HuubAppelboom, a first level of control based on first name, family name and optinally middles name might be a useful. As listed in the API documentation included in the PR https://github.com/camaraproject/KnowYourCustomer/pull/50 there are restrictions on AgeVerify API usage. BR Gilles

ToshiWakayama-KDDI commented 3 months ago

Hi @GillesInnov35 , @HuubAppelboom , @javier-carrocalabor ,

Thank you for your active discussions. As KYC Match/Fill-in v0.1.0 release version is completed, we(KYC SP) can focus on this now.

I have read it all your comments... The main problem we see is also that, usually, the known age is for the owner of the contract, and how to verify that the user at that moment is really the owner of the contract.

I agree with this. For this problem, one thought I have now is that OIDF guys are going to introduce their activity and views on Age Verificaiton at our next meeting on 19th March, so we may be able to get some additional hints from them.

@GillesInnov35 , Thanks for drafting the API documentation in the PR #50. One scenario we are thinking to add would be: To verify the age of a user to buy online age-restricted goods e.g. cigarettes, alcohol.

Many thanks, Toshi

GillesInnov35 commented 3 months ago

@ToshiWakayama-KDDI Thanks a lot for the information. Yes it'll be very interesting to see how OIDF manage to improve age verification without any upload of identity document (passport, national id card, etc). Regarding the first design proposal and the main restriction that the user mobile phone number must be that of the onwer of the contract, interest in the API could be less or not. BR Gilles

GillesInnov35 commented 3 months ago

Hi all, we have discussed at Orange about the first proposition of an Age Verification API design. According to information held :

We think that the API Age Verification API should expose a resource named /contactverify which would much reflect what could be definitely performed by the MNO.

Nevertheless, if a age verification service /ageverify must be defined for example to cover Use Cases such as verifying 18+ 21+ 25+, the limitation due to restrictions and the probability lower than 100% must be clearly indicated (T&C) in order not to engage the MNO. In this case, perhaps a score result could be returned which could be a help for partners combined with others results to evaluate or estimate the age.

HuubAppelboom commented 3 months ago

@GillesInnov35 What do you mean by a resource named /contactverify ? I would not use this name, it is generating confusion,.

As far is the inputs that are minimum required, we can also add other attributes to the input that can help to determine whether the end user in question and the contract owner are the same person. To avoid giving away too much information on the matching details, you will only report the result of the ageVerification request, not whether any of the attributes actually matches. And yes, you can combine this with a parameter that indicates how likely it is that the age verification is correct.

For example, this how this could work:

In the Request Body: { "phoneNumber :" +31612345678 "ageVerification": 24 "givenName": John "middleName": "William" "familyName": "Hutchinson" "email": "jwhutchinson@gmail.com" }

In the Response body: { "phoneNumber :" +31612345678 "ageVerification": Y "ageAssuranceLevel": 95% }

This way you can also cater for scenarios where the MNO has validated the age of that particular user by others means (and you report the assuranceLevel at 100%, and you can also extend this apporach with more input parameters in the request body that can help identify whether the user and contract owner are the same individual. The above example includes email, but this could be extended further, for example with bank account number etc.

The ageAssuranceLevel is something that the MNO determines, and it might be a good idea to draft some guidelines on this.

HuubAppelboom commented 3 months ago

@ToshiWakayama-KDDI @GillesInnov35 @javier-carrocalabor

Regarding assurance levels, the assurance level of the Age Verification API and also of the KYC Match API strongly depends on the assurance level of the identity data that the MNO holds. It may be a good idea to add the assurance level of the data that the MNO uses in the response of the API.

For Europe, a commonly used specification for identity assurance levels is that of eIDAS (see https://ec.europa.eu/digital-building-blocks/sites/display/DIGITAL/eIDAS+Levels+of+Assurance). For other regions other specifications may be used like that of NIST: https://csrc.nist.gov/glossary/term/identity_assurance_level

For eIDAS, I would put for example something in the response like: "identityAssuranceSpecification" : "eIDAS" "identityAssuranceLevel" : "substantial"

(the level can be low, substantial or high)

This way you can communicate how reliable the identity data is. Especially if there are telco's where no or hardly any identity verification is done during onboarding (level low) this will become relevant. And when extra effeort has been done (level high), this can also be reflected).

GillesInnov35 commented 3 months ago

Thanks @HuubAppelboom , a very interesting thought. Adding a level of assurance could be useful for a partner to decide how to understand the response. The difficulty will be to define and explain how the assurance level has been evaluated. BR Gilles

GillesInnov35 commented 3 months ago

@HuubAppelboom , to your comment

What do you mean by a resource named /contactverify ? I would not use this name, it is generating confusion,.

The objective was to differenciate the 2 opérations which have not the same objective

I understand that what is expected is mainly the second one.

I've updated the PR https://github.com/camaraproject/KnowYourCustomer/pull/50 with Huub's proposition with an assurance level returned in the response to continue the discussion. Thanks a lot Gilles

ToshiWakayama-KDDI commented 3 months ago

Thanks @GillesInnov35 , @HuubAppelboom , for interesting discussions.

Adding a level of assurance could be useful for a partner to decide how to understand the response. The difficulty will be to define and explain how the assurance level has been evaluated.

I tend to agree that it will be difficult to define the level of assurance. Actually I am not quite sure how the level of assurance should be calculated. It can be defined in EU, but I feel that different areas, markets, and countries may have different requirements, so, it will be difficult to define how to calculate the level of assurance as a global API parameter.

Thanks.

HuubAppelboom commented 3 months ago

@GillesInnov35 @ToshiWakayama-KDDI @StefanoFalsetto-CKHIOD

For calculating a match rate (or whatever you want to call it), for Age Verification we could be using Fuzzy Name Matching logic to

  1. determine first the match rate of the First Name, Middle Name, Last Name and Email attributes. This match rate can be calculated using Fuzzy Name Matching logic. See for example https://spotintelligence.com/2023/07/10/name-matching-algorithm/ for an overview of methods that you could use for this.
  2. Then calculate an average of match rates the input parameters, to give an indication of the total accuracy, for example by muliplying the individual match rates

For example, If the match rate of the first name is 50%, and the match rate of the family name is 100%, and the match rate of the email address is 80%, you would get a total match rate for the age verifcation of 40%.

Note that the same Fuzzy Name Matching can be applied with KYC Match. Note that there are several Fuzzy Name Matching methods available, would be interestering to see what the experience of others is in using these.

HuubAppelboom commented 3 months ago

@ToshiWakayama-KDDI @GillesInnov35 @StefanoFalsetto-CKHIOD

Regarding the Level of Assurance parameter (which indicates the quality of the data of the MNO), this is especially relevant if the MNO has for example Date of Birth data that the contract owner has filled out themselves, or whether a verification has been done against for example an identity document.

In eIDAS definitions , ti roughly works like this

Most Telco's in Europe operate probably somewhere at level low or substantial, and this is relevant information at which level. We could also simplify this by making a distinction between just level low (no verification has been done) , or level susbtantial or higher. Especially when it is level low , you should know this, because that may be too easy to circumvent (or maybe we should not offer Age Verify or Match in that case at all).
What do you think ?

GillesInnov35 commented 2 months ago

Thanks @HuubAppelboom, I agree with you that we should conclude to 2 or 3 levels of identity assurance returned to the consumer. I think that 3 levels might be kept. As you rightly said, the high level could only be returned in case of Centralized Identity Control solution has been implemented, based on identity documents or other proof of identity. Even if it's not a target solution for Camara, it exists.

to your comment

Note that the same Fuzzy Name Matching can be applied with KYC Match

at Orange, the matching algorithm implemented by the MNOs in Match API is based on the Jaro–Winkler distance. Thanks a lot BR Gilles

HuubAppelboom commented 2 months ago

@GillesInnov35 What is your experience with Jaro-Winkler ? How does it compare to Levenshtein ? Any experience with match rates that you can obtain ?

GillesInnov35 commented 2 months ago

@HuubAppelboom, I don't know why this algorithm was chosen. What I know is that the 3 french operators Orange, SFR and Bouygues have implemented this algorithm for their Match Identity API. I'll try to have some criteria of the decision. BR Gilles

StefanoFalsetto-CKHIOD commented 2 months ago

@GillesInnov35 @HuubAppelboom, we used Jaro-Winkler for a PoC and the statistical results are better than Levenshtein. For what I understood, it depends on compared strings nature. For example, usually the first names are wrongly written somewhere into the middle (letter swap, or shortname) of the word and not at the beginning. And Jaro-Winkler formula is taking into account the prefix similarity.

I think an other interesting topic could be fuzzy comparison of hashed data. Does anyone have experience in fuzzy hash comparison like minhash/hyperminhash or MRSH-v2?

HuubAppelboom commented 2 months ago

@StefanoFalsetto-CKHIOD Hi Stefano, Before applying JAro Winkler, do you do any data normalization (like remving non-alphanumeric charcaters, remove spaces, making all letters lowercase etc ) ?

GillesInnov35 commented 2 months ago

hi @StefanoFalsetto-CKHIOD , @HuubAppelboom , good question, at Orange a normalization is applied before applying algo removing non-alphanumeric characters, remove spaces, lowercase characacters) BR Gilles

KevScarr commented 2 months ago

@StefanoFalsetto-CKHIOD @HuubAppelboom @GillesInnov35

New to the group so sharing the Vodafone experience (also now keen to assist to bring these scores into the product).

We did a live trial last year in the UK with Jaro-Winkler (JW) and Levenshtein (Leven) and selected JW.

When we inspected names that should match (either spelling errors or abbreviations) but didn't, Leven performed badly. We were concerned before the trial that JW would over score non-matches, however from the actual study this wasn't the case and we saw a boost of 5 score points when the first name was normalised before a comparison. We saw a good spread of scores for matches, close matches and non-matches (ie sufficient for a machine learning algorithm to find a cut off point).

Our recommendation to our product team was to perform basic normalisation and use JW and provide a score back to the caller. However, the MNO can apply the normalisation in this case where plain text values are passed in.

StefanoFalsetto-CKHIOD commented 2 months ago

@HuubAppelboom Yes. We use the string normalization even in our current KYC Hash Match version (derived from GSMA IDY.28 specifications) and this is helping us to improve the match rate. The normalisation process (valid for European languages) is as follows:

  1. Remove white spaces and punctuation characters, as defined in POSIX standard classes [:space:] and [:punct:]
  2. Convert to lowercase the string
  3. Substitute special characters (i.e., the ones with stress) to the corresponding "plain" version (we have defined a conversion table), converting from the source character encoding to latin1.
HuubAppelboom commented 2 months ago

@StefanoFalsetto-CKHIOD @KevScarr @GillesInnov35

One suggestion to think about: what I see that other suppliers of similar matching services do, is to provide the actual data that they have when the match is close enough.

For example, if the end user has entered Wlliams as a Family Name, and you have Williams on file, you return "Williams" with the answer, so that the customer can check with the end user whether they have meant "Williams" in stead of "Wlliams".

That way you can help users correct apparent mistakes.

StefanoFalsetto-CKHIOD commented 2 months ago

@HuubAppelboom it could be useful but:

  1. It is forcing to change again the MNO agreements for privacy policies. I found quite hard to persuade MNOs to share their plain text data, hence the success of KYC Match service: no MNO data sharing, just match result. Given the "fear" that I perceived from our MNOs, I wonder how other suppliers obtained to share this data.
  2. It is done with the hope that the users will correct their data. Usually I don't rely on good end users behavior.
KevScarr commented 2 months ago

@HuubAppelboom Its been our experience, our Privacy team are comfortable with a zero knowledge approach to these services (ie a Yes or No for match) but not disclosing the held value, you may struggle with MNO adoption.

javier-carrocalabor commented 2 months ago

Agree with @StefanoFalsetto-CKHIOD and @KevScarr: our Privacy Team (and, therefore, final customers; or the other way around) are more comfortable with just verification. I think we have to reduce data exposition for use cases where it is really necessary and valuable.

GillesInnov35 commented 2 months ago

hi all, does it mean Fill-In API won't be implemented nor proposed on your side ?

HuubAppelboom commented 2 months ago

@GillesInnov35 We will only support verification as well. In our privacy policy it says we never share data, so we will not offer the Fill-In API.

KevScarr commented 2 months ago

@GillesInnov35 Same for us; Verification only, no plans for fill-in/share.

GillesInnov35 commented 2 months ago

ok thanks. BR

StefanoFalsetto-CKHIOD commented 2 months ago

Same here.

HuubAppelboom commented 2 months ago

@ToshiWakayama-KDDI @javier-carrocalabor @GillesInnov35 @StefanoFalsetto-CKHIOD

Below is a proposal which tries to cover all your preferences, and keeps fuzzy name matching logic optional. Please feel free to comment.

In the request body, you put the phone number (or identifier), and the age (in years) that needs to be verified.

In the request body:, you can put also all the attributes for KYC match that can help with to distinguish whether the contract owner is the end user (when available):

At least one of the following should be included which can sufficiently distinguish this:

For each attribute, the telco determines a match score (in percentage). To calculate the Match Score, the telco does the following for each attribute

  1. Normalize the data. The normalisation can be chosen by the Telco which gives the most accurate result. Typically it will include removal of spaces, removal of non- alphanumeric characters, making all characters lowercase. Optionally this can also include removal of prefixes which are not considered relevant for the identification, mapping of characters etc.
  2. The Telco applies either a Yes/No match or Fuzzy Name Matching logic like Jaro-Winkler. Fuzzy Name Matching Logic is only to be used for name attributes where spelling mistakes can have happened:
    • name
    • givenName
    • familyName
    • middleNames
    • familyNameAtBirth
  3. For each attribute a score is calculated in 0-100% scale. Y=100%, N=0%
  4. An overall score is calculated by muliplying the scores of all provided attributes.
  5. To compensate for the fact that people may have multiple email accounts, you also calculate the overall score without the email address score. Whatever gives the highest score is the score that will be returned.

In the Response body, you put the following information: { "ageVerification": Y "ageVerificationScore": 95% }

Normally, the telco can distinguish quite well what the cutoff level in ageVerificationScore is when the end user is the contract owner, and this is reflected in the ageVerification Y/N result. In case there is no fuzzy name matching applied, the score will always be 100% or 0%.

GillesInnov35 commented 2 months ago

@HuubAppelboom , thanks a lot for this clear and very interesting return of experience proposal. J just have an interrogation on the 2 returned attributes. If the API (MNO) returns a match score pourcentage do you think a boolean Y/N is also necessary. The consumer (channel partner ou B2B app) should have the responsability to decide, shouldn't it ? Gilles