camaraproject / KnowYourCustomer

Repository to describe, develop, document and test the KnowYourCustomer API family
Apache License 2.0
9 stars 9 forks source link

Creation of a Pull Request for Age Verification API #46

Open GillesInnov35 opened 8 months ago

GillesInnov35 commented 8 months ago

To start the discussion

CAMARA Milestone: YAML files with supporting documentation for all drop 2&3 APIs 'included AGE Verification) completed by the end of March 2024 Thanks

HuubAppelboom commented 5 months ago

@GillesInnov35 The Y/N is more of an advice form the MNO (because the MNO can see all data). What I expect with the Fuzzy Name Logic is that when you plot the cumulative distribution function of Score versus percentage of users is that there are sharp drop-offs in the data. The MNO can probably see this, and decide below which percentage you should not assume anymore that the end user is the contract owner. This cut-off probably varies per MNO, depending on the quality of the data the MNO has.

As an alternative, the MNO can also provide just the ageVerificationScore when it is above the cut-off point, and when it is below the cut-off provide a N.

KevScarr commented 5 months ago

@HuubAppelboom Question from me... relates to this concept"Telco applies either a Yes/No match or Fuzzy Name Matching logic". 1) Is there a scenario where you wouldn't provide the 'score' if the response was an 'N' (ie so return both for the non-numeric fields and no-matches)? 2) And if you don't have the information available, what is the response value? (apologies if you've answered this already).

HuubAppelboom commented 5 months ago

@KevScarr @GillesInnov35

I think it is only a good idea to provide the score if it is above a certain level (to be determined by the MNO).

Come to think of it, we should also try to keep it in line what we have already defined in the answer for KYC Match (so [ true, false, not_available] )

So for example, let's assume the MNO has determined that the level should be at least 80% to provide an answer.

Let's assume the score is now 95%, the answer will be:

{ "ageVerification": true "ageVerificationScore": 95% }

or

{ "ageVerification": false "ageVerificationScore": 95% }

In case the ageVerificationScore drops too low, or the data is insufficiently available, you provide

{ "ageVerification": not_available }

or, in case we must provide always the ageVerificationScore to avoid errors, we can also provide

{ "ageVerification": not_available "ageVerificationScore": 0% }

HuubAppelboom commented 5 months ago

On 2nd thought, with the idDocument may also have different valid values, for example in the Netherlands people can have a passport, driver license and id card, each of which is a valid identification document (and with a different number).

If there is a match with a document number, it is a very strong indicator the end user is also the contract owner.

In this respect, to calculate a total score for all the attributes provided, it may be a better idea to calculate a weighted average over all the inputs provided, to come to a total score, and to calculate a score with idDocument and email included and excluded.

StefanoFalsetto-CKHIOD commented 5 months ago

I'm sorry I couldn't attend to the latest call so maybe I am asking questions already answered. Are we talking about record linkage or single attribute match score? Because the use cases are quite different and I think they are both useful.

HuubAppelboom commented 5 months ago

@StefanoFalsetto-CKHIOD What we try to do here is to identify whether the end user is over a certain age, and part of the assesment is trying to check whether the end user is also the contract owner (otherwise the data does not make sense).

What are the other cases you are thinking about ?

KevScarr commented 5 months ago

@HuubAppelboom Aligning to KYCmatch response values is a good idea, it would give consistency when we come to retrofit score attributes back into the KYCMatch product. I'm not sure of the value of an overall score if you're providing back the individual scores; the Client and/or Aggregator consuming the service can use the individual attribute scores to make their Machine Learning decision; not opposed to it, just not sure if a customer of the API would use it

HuubAppelboom commented 5 months ago

@KevScarr The advantage of just replying with the Age info and an overall score is that you can price this at a separate price point (as compared to make the Age Verification check part of the KYC Match API), and that you do not need to expose the additional information which may not be needed. If a customer would like to have the other data verified as well, and know the answer per attribute, they can run that through the KYC Match API.

GillesInnov35 commented 5 months ago

I'm not sure but in fact you propose to add those attributes in the AgeVerification request for being verify before age verification.

The Fuzzy Name Matching Logic will be only used for some them.

Response will be

BR Gilles

KevScarr commented 5 months ago

@HuubAppelboom Thanks, super clear. So under AgeVerify return a single score whereas in the match product, provide back the individual attribute scores. Make sense, today in our journeys, we have our customers call KYC-Match API first then the Age-verify API.

ToshiWakayama-KDDI commented 5 months ago

@HuubAppelboom , @GillesInnov35 , @StefanoFalsetto-CKHIOD , @KevScarr , all,

Thank you so much for the proposal and further discussion.

Sorry for the late comment, but from KDDI side, I have discussed this internally with our product team, and for Age Verification, we don't think we need any additional attributes, i.e. we only need "phoneNumber: +31612345678 (e.g.)" and "ageVerification: 24 (e.g.)" in the Request Body and "ageVerification": True/False/not_available (e.g.) in the Response.

So, from KDDI side, we would request to make any additional attributes optional and to make the Age Verification API work without any additional attributes. Our customers like this type of Age Verification (no frill) and we don't have any problem in Japan (and some other countries, I believe).

We understand EU's case, so, if the additional attributes are necessary, it can be documented properly in the API documentation or in the MNO's service document, e.g. in EU countries, attributes AAA, BBB, CCC, DDD are necessary. So, for the API, these attributes are optional, but for the service in EU, they are mandatory.

Best Regard, Toshi

HuubAppelboom commented 5 months ago

@ToshiWakayama-KDDI @GillesInnov35 @StefanoFalsetto-CKHIOD @KevScarr

As a suggestion for making the input attributes optional, this can be done. In some markets you also may want to more input mandatory (for example email) to come to a reliable result. If we make all input paramaters optional, I suggest that we can also provide feedback when the API consumer has provided too little data to come to a reliable conclusion.

For example, the response could look like:

"ageVerification": true/false/not_available/insufficient input data provided "ageVerificationScore": x percent

KevScarr commented 5 months ago

@HuubAppelboom If inputs are optional then it makes sense to support an "insufficient" response to me.

In the UK market, Pay-As-You-Go customers do not need to do a formal ID check and so can enter their own information on the MNO CRM system. So it may be appropriate to also have an 'unverified' response.

Today in our age product, if the account is PAYG, or has multiple numbers on it (ie user and owner may be different) and the adult-content-bar is on, we don't provide a response. So it may be appropriate to return an "unverified" response and no score?

HuubAppelboom commented 5 months ago

@KevScarr Hi Kevin, it may indeed be a good idea to add an unverified response, in case you want to share such data.

In this case we can have the following response options:

"ageVerification": true/false/unverified/not_available/insufficient input data provided "ageVerificationScore": x percent

where unverified means true, but the data is not verified, and the ageVerificationScore is only provided in case of true or false.

HuubAppelboom commented 5 months ago

@KevScarr In our case, we don;t have any age data on prepaid users. The end-user vs contract owner issue we are trying to solve through the additional input data (givenName, middleNames, email). But ofcourseyou can always also exclude the multiple SIM group.

KevScarr commented 5 months ago

@HuubAppelboom Superb, agree with your points and thanks for including unverified.

ToshiWakayama-KDDI commented 4 months ago

Hi @GillesInnov35 , Hi all,

As we have had very long discussions, could you summerise the conclusion before moving on to PR? It is difficult to see what proposals are remaining, I feel.

A basic question, please. We have started Match Scoring discussion. As Age Verification has scoring feature, should it be aligned with Match Scoring?

Thanks, Toshi

GillesInnov35 commented 4 months ago

hi @ToshiWakayama-KDDI, see bellow my understanding and I let @KevScarr and @HuubAppelboom update the summary

The request

The response

this is what has been mofied in the first design proposal in the PR https://github.com/camaraproject/KnowYourCustomer/pull/50:

KevScarr commented 3 months ago

To add to the above; we would support the Telefonica view for these attributes to be optional. Some use-cases don't have the full identity information available to check, where they do I would agree, KYC Match is a better fit to do a thorough check before calling this API. One difference between the scoring in Match and AgeVerify; in Match each attribute receives a score when it's a non-match; Under the proposal above, it's a single overall score that is shared. Good summary above @GillesInnov35 .

HuubAppelboom commented 3 months ago

There seems to be a majority for not mandating any of the individual attributes; in this case, we can also omit the response "insufficient_data" and limit the response to "true/false/unverified/not_available".

I do believe hower that it is still a good idea to ask for the attributes that can distinguish the end user from the contract owner in one and the same Age Verification API call. If you don't put it in one call, you will have to write an instruction like first you need to figure out whether the end user is the contract owner by calling KYC Match, and only if you think the end user is the contract owner, then it makes sense to call the Age Verification API (and otherwise you shouldn't). That may lead to a lot of mistakes by developers (because they didn't read the instruction, or they may try to cut costs by omitting the KYC Match call, etc). Putting the distinguishing attributes in the same call will prevent this kind of errors, it will reduce the total processing time, and you can position this call at a different price point than a full KYC Match call.

GillesInnov35 commented 3 months ago

hi @HuubAppelboom , @KevScarr , very interesting discussion. I agree with the fact that from a technical flows point of view, adding attributes of identity will avoid to consume 2 services (Match and AgeVerification). And in my opinion it's a good thing. According to a business approach, perhaps a bundle of APIs (KYC Match + KYC Age Verification) might be more interesting. Nevertheless I would prefer to see those optional individual attributes in the API design. If atributes are not provided, score result will be 0 or N/A. Thanks a lot BR Gilles

ToshiWakayama-KDDI commented 3 months ago

Hi @GillesInnov35 , all,

Thank you very much, Gilles. Good summary. Excellent.

May I ask just a few questions for clarification, please?

"Main rule: MNO will be able to return a result only if the user is the owner/contact of the contract known by the MNO (so age greater than 18)"

'so age greater than 18' may not be always true, I think. It may be different from country to country, from operator to operator, so, we do not need this pharse, do we?

2.

"by also returning the global score it allows consumer application to decide how to understand the response."

What do you mean by 'the global score'? Sorry, I just do not understand it.


In addition, we would also support the Telefonica view for these attributes to be optional, as I have mentioned and commented some times.

But Huub's latest comment is very interesting. Good point. I will share it internally with my colleagues.

Many thanks, Toshi KDDI

GillesInnov35 commented 3 months ago

hi @ToshiWakayama-KDDI , find bellow my point of view and answers

1-

'so age greater than 18' may not be always true, I think. It may be different from country to country, from operator to operator, so, we do not need this pharse, do we?

yes you're right regarding countries' law. "(so age greater than 18)" must to be removed 2-

What do you mean by 'the global score'? Sorry, I just do not understand it.

AgeVerification might return a aggregated score according all provided attributes score . Aggregated is a better word than global

BR Gilles

HuubAppelboom commented 3 months ago

@GillesInnov35 @KevScarr @ToshiWakayama-KDDI In case no additional attributes are provided to asceration whether the end user is the contract owner, you could also provide for the score the chance that the end user is the contract owner and the age verification is correct, based on the statistics you may have accumulated with the KYC Match API. For example, if you know that on average in 72% of cases the end user is the contract owner, you can provide this as an aggregate score.

And ofcourse if you do have additional attributes provided, you can make an aggregate score with those attributes.

ToshiWakayama-KDDI commented 3 months ago

Hi @GillesInnov35 ,

I have one question from our internal discussion.

The request

contains the age value to be verified and attributes of identity which might be optional

I understand attributes of identity will be optional. Under this assumption, the mandatory attribute to be put into the request body will be the age value only. Is this correct? Or, phoneNumber will be another mandatory attribute?

Many thanks.

GillesInnov35 commented 3 months ago

hi @ToshiWakayama-KDDI, yes age value should be mandatory but as it is the case for the others KYC API the phone number will not be mandatory because in 3-Legged Authentication it will not be provided in the request body. BR Gilles