Open ToshiWakayama-KDDI opened 1 month ago
Hi @ToshiWakayama-KDDI Linking out to a thread / good discussion around the concepts for 'score': [#46] .
I would summarise and propose the below, where 'attribute' below is a field in the existing KYC specification:-
Hi @KevScarr Why not provide the Score value as well when the "attributeMatch" is "true", but when there is a small difference (probably a spelling mistake on either side) ?? Or do you propose to provide only a "true" answer when the Score is 100% ?
@HuubAppelboom I would suggest a true equates to an exact match, ie =100. for close matches ie when you return a score allow the consuming service to judge if it's a close enough match or not to proceed (their use-cases will drive their error tolerance).
hi @HuubAppelboom , @KevScarr, I understand that a score result (optional) might be added to a boolean attribute (True/False/ Not-avalaible) which is mandatory if provided in the request. Inthis case, I wonder if the boolean attribute is useful. At Orange the response contains only a score match result. Consumer has to decide. Gilles
@GillesInnov35 @HuubAppelboom Fair point; purely thinking about when a customer of the service migrates from the previous version to this version so backward compatibility would be important. I'd say the score is only provided when a boolean: false is returned; outside of that condition it offers little value. For Orange: Do you still respond with a not-available indicator? and can you share which algorithm you're using (JW?)
yes sure Kevin, backward compatibility will be an important point, but as KYC Match version 1.0.0 has not been published I wonder if it is a problem. But may be it is. to answer to your question:
Thanks a lot for your active contribution Regards Gilles
Makes sense. So you would return a '-1' when the attribute wasn't available for checking, hence no requirement to have the boolean field in your current response.
If no MNO has implemented the current version then it's a fair shout to move towards a score only approach.
@KevScarr @GillesInnov35 We may need to think of an approach which makes it possible to be extended further. For example, I think it may be a good idea to provide feedback whether the data is unverfied or has been verified by the MNO. That way we can provide a larger market reach, by also including unverified attributes, and the CSP can then decide whether to use that attribute or not.
Hi @GillesInnov35 , @HuubAppelboom , @KevScarr , all,
Thank you for your prompt comments/discussion, which I did not expect actually.
I should have informed you that there is KYC Match scoring enhancement proposal in the API Backlog WG, so, once we have received the proposal, we should proceed with our scoring discussion taking it into account. We should wait for it, but I don't think it will take long.
I will update the status.
Best regards, Toshi
Hi @GillesInnov35 , @HuubAppelboom , @KevScarr, all,
Our implementation is based on v0.1.0, and actully we do not need scoring feature, so, we would insist KYC Match API should work without scoring. It is the OGW original scope, I understand, and for a OGW global API, it is also important. In addition, as we all know, we have put our efforts into v0.1.0 already, so we should use our initial design and consider backward compatibility as much as possible, I believe.
Thanks, Toshi
As a suggestion how to add score and other information to the API response, maintain backwards compatibility, and have something that can be expanded, we could add an extra string (when applicable) in the response for attributes where score is relevant.
For example the attributeMatch will have values "true", "false", "not_available" (like today) And we add an extra answer "attributeMatchInfo" that contain items like "score=89 unverified" to signal that the Jaro-Winkler score is 89, but that the source data has not been verified by the MNO. And when we have additional metadata, this can be added in future.
So for example you will get:
givenNameMatch : false givenNameMatchInfo : score=95 verified
hi @ToshiWakayama-KDDI, all, thanks for your comment.
I had a look at the API Backlog issue/PR opened by @jgarciahospital on API Enhancement Proposal KYC-Match Scoring. It is in line with our current discussion on how adding a match score level information, and so it is interesting.
I'm afraid it'll be difficult to propose a backward compatibility if we've to replace a simple attribute by a object structure after version 0.1.0.
This is just my point of view to be discussed.
For example:
BR Gilles
Hi all,
As advanced in last week meeting:
1) Telefonica has implemented v0.1.0, therefore we would need backwards compatibility in v0.2.0
2) This would be in line with the proposal of maintaining current true/false/not_available response and in the case of false, adding a score. For example:
• Keep current attributes-> "attributeMatch": true/false/not_available • If false, add additional parameters -> "attributeScore": X%
From the technical perspective, this should keep backwards compatibility as, based on OAS3, there is a parameter called “additionalProperties” which indicates if the object (our answer in this case) can have additional parameters not documented or not. The default value of “additionalProperties” is true, therefore in CAMARA we assume it is true. So the customer should be ready to receive additional parameters. It would be worth it to check this.
3) However, the proposal of changing a simple attribute to an object structure would not be an option for backwards compatibility, therefore not possible for us
4) Ok to proceed with the following rules proposed for the score:
• Numeric attributes are not checked: ie birthdate • The response "attributeMatch" must be 'false' • The Score value is a whole number (%): 0 to 100 (0 = no match, 100 = exact match) • Using Jaro-Winkler distance algorithm (after normalisation has been applied).
Regards, Clara
hi all, thanks Clara for this detailed summary. If we must address backward compatibility because of v0.1.0 already deployed, I agree with you that we should add new optional score attributes. Do you think we've time to imagine a design based on OAS3 specifications in order to avoid a long list of attributes ? BR Gilles
Building on Issue #96 / we should follow the same design convention (define once, use many):-
ScoreMatchResult:
type: integer
description: Attribute comparison score as a percentage for string comparisons
example: 85
minimum: 0
maximum: 100
KYC_MatchResponse:
type: object
properties:
idDocumentMatch:
$ref: '#/components/schemas/MatchResult'
nameMatch:
$ref: '#/components/schemas/MatchResult'
$ref: '#/components/schemas/ScoreMatchResult'
givenNameMatch:
$ref: '#/components/schemas/MatchResult'
$ref: '#/components/schemas/ScoreMatchResult'
ScoreMatchResult to appear for all attribute fields, excluding the following fields as they are numeric/enum/ID based:-
When a field is numeric only in a particular country, as per the above summary, the score wouldn't be returned.
I've taken the attributes from the current version of the specification and following the rules given an initial view of which attributes can support a 'score' concept in full. It would be good to reach a common view across as many countries as possible, it'll then make updating the yaml spec straightforward.
Attribute | Optional Score Available | Comment |
---|---|---|
idDocumentMatch | No | It’s an ID number. |
nameMatch | YES | |
givenNameMatch | YES | |
familyNameMatch | YES | |
nameKanaHankakuMatch | ??? | Are these fields in next release? |
nameKanaZenkakuMatch | ??? | Are these fields in next release? |
middleNamesMatch | YES | |
familyNameAtBirthMatch | YES | |
addressMatch | YES | |
streetNameMatch | YES | |
streetNumberMatch | YES | Is this houseName in some countries / assumption yes |
postalCodeMatch | No | Being out by one letter can be a different place. |
regionMatch | YES | |
localityMatch | YES | |
countryMatch | YES | |
houseNumberExtensionMatch | No | It’s numeric, not relevant. |
birthdateMatch | No | It’s numeric, not relevant. |
emailMatch | YES | |
genderMatch | No | It’s an enum type. |
Some fields in some countries will be all numeric in others, a mixture. The table above captures which match attributes in the “KYC_MatchResponse” can support a ScoreMatch.
@ToshiWakayama-KDDI Should the nameKana*Match attributes also have scores in this next version of the specification (ie will these attributes remain here or be in an extension)?
Building on Issue #96 / we should follow the same design convention (define once, use many):-
ScoreMatchResult: type: integer description: Attribute comparison score as a percentage for string comparisons example: 85 minimum: 0 maximum: 100 KYC_MatchResponse: type: object properties: idDocumentMatch: $ref: '#/components/schemas/MatchResult' nameMatch: $ref: '#/components/schemas/MatchResult' $ref: '#/components/schemas/ScoreMatchResult' givenNameMatch: $ref: '#/components/schemas/MatchResult' $ref: '#/components/schemas/ScoreMatchResult'
Hi @KevScarr I agree with the porposal of creating a common schema for the response objects, but I don't fully understand what is the final result here. As far as I know in OAS3 we cannot use two $ref objects at the same level.
From TEF our proposal is mainly focused in not losing the retrocompatibility as we are already integrated with clients so the design could be simpler:
idDocumentMatch:
$ref: '#/components/schemas/MatchResult'
idDocumentScoreMatch:
$ref: '#/components/schemas/ScoreMatchResult'
We can document that the ScoreMatch properties will only be returned if the related property is false
hi @fernandopradocabrillo, I think that with an allOf word it works well.
allOf:
- $ref: '#/components/schemas/MatchResult'
- $ref: '#/components/schemas/ScoreMatchResult'
to be confirmed I suppose BR Gilles
hi @fernandopradocabrillo, you're right. My proposition bellow can't be applied.
allOf:
- $ref: '#/components/schemas/MatchResult'
- $ref: '#/components/schemas/ScoreMatchResult'
I agree with yours regarding backward compatibility which is expected. Gilles
Hi @KevScarr , all,
@ToshiWakayama-KDDI Should the nameKana*Match attributes also have scores in this next version of the specification (ie will these attributes remain here or be in an extension)?
Thank you for asking me about this. We would prefer to have scores for the nameKanaHankakuMatch and the nameKanaZenkakuMatch attributes in this next version.
Sorry for the late reply, as I needed to discuss this internally.
BR Toshi
Hi @KevScarr , @fernandopradocabrillo , @GillesInnov35 , @claraserranosolsona , all
I have a question for my clarification about way of scoring.
It seems that Jaro-Winkler distance algorithm will be used for scoring of string-type attributes (after normalisation has been applied), however, I think it should be up to each operator to choose the way how to calculate scoring.
The reason is, even though in Europe Jaro-Winkler distance algorithm could be used as the common way, it is unclear that Jaro-Winkler distance algorithm can be used for other languages, or, if it can be used for another language, it unclear that Jaro-Winkler distance algorithm is best suited for it. That is my concern, and actually we ourselves are not sure about using Jaro-Winkler distance algorithm for Japanease language.
So, is it OK that it will be up to each operator to choose the way how to calculate scoring, or, is there any other thought?
Thanks, Toshi KDDI
hi @ToshiWakayama-KDDI , all, I don't really know if this algorithm works for all languages but it should (to be confirmed). I think we should validate an unique algo to have the same specifications and the same rules for all KYC Match API providers and avoid specific implementation.
BR Gilles
Hi @Gilles, Thanks for your comments.
"I think we should validate an unique algo to have the same specifications and the same rules for all KYC Match API providers and avoid specific implementation."
This is agreeable sentence, however, as Jaro-Winkler algorithm has not been proved effective for other languages than European languages, it would not be a better way to specify Jaro-Winkler algorithm as mandatory algorithm. If specific algorithms are needed in KYC Match API spec, for example, Jaro-Winkler could be recommendation for European languages, but algorithm for other languages should be TBD.
Would this be a possible way forward?
BR Toshi
Problem description
To consider Scoring feature for KYC Match. (Spin off from Issue #65, item No.1, as per Action Item #13.03)