Location Verification Implementation Guidelines

jlurien commented 1 year ago

Problem description There is no clear guideline about how to implement the response to location-verification, when result is PARTIAL, and it is not obvious how to calculate the matchRate

Expected action Trigger discussion and agree on some guidelines for common implementation.

An initial proposal is presented, to cover several scenarios that may happen. Also with comments and concerns.

Additional context

Word Document: Location Verification Implementation Guidelines.docx

alpaycetin74 commented 1 year ago

I was always thinking of this as a ratio based on the intersection. I think the match rate can be defined as: (intersection area) / (network provided area). Since we know the center coordinates and radiuses of both circles, it is mathematically possible to calculate the intersection area.

jlurien commented 1 year ago

@alpaycetin74 Yes, that is exactly the proposal.

There are 2 problems:

1) When network provided area is huge compared with request area, even when intersection area == request area, the ratio may very low (e.g. 1% or 2%)

2) How would the client distinguish these two cases, as both would return PARTIAL + a similar matchRate:

From a technical point of view, implementing this formula is feasible. Concern is more about from a Product/UX perspective

alpaycetin74 commented 1 year ago

In the figure on the left, the network provided area has a much larger radius than the request area, so there is a good chance the real location falls outside the requested area. I think it makes sense the matchrate is calculated as a small value in this case. It is true the left and right figures look quite different, but maybe it doesn't matter much as far as accuracy is considered. I don't know :)

jlurien commented 12 months ago

Comments raised during meeting on August 1st:

Akos commented that the ratio may not be enough and maybe we should add some text or rational.
Cetin thinks that it makes sense to return a low percent since it is caused by a bad behaviour of the network.
Telefónica agrees but raise the problem with Product team approach and how the quality of the product is perceived.
We will see the possibility to split the partial values into different values as proposed by Jose and Akos.

bigludo7 commented 11 months ago

Hello Thanks for the document - This is helpful.

If I got it right @alpaycetin74 the 'formulae' to calculate the matchRate should be a bit more complex - something like this: matchRate (between 0-1)= minimum [(intersection surface/requested surface), (intersection surface/networked checked surface)].

There is indeed a risk that often the matchRate is low. In your diagram in left @jlurien - when the requested surface is too small to the network surface checked I guess we should ask for a larger surface in the request. This could be triggered when intersection surface= requested surface but intersection surface/networked checked surface < 0.4 for example.

Alternatively, another option could be to internally calculate the matchRate - not provide it - but instead in the response for True/False provides a 'confidenceRatio'.

If matchRate between 1 to 0.51 (for example) we answer True with confidence status = matchRate If matchRate between 1 to 0.50 (for example) we answer False with confidence status = 1-matchRate

Happy to discuss this.

jlurien commented 11 months ago

We may all agree that cases where network accuracy is low are the problematic ones. The proposal to introduce a confidence rate along with True/False is interesting. With bad accuracy, matchRate will be low (below 0,50) and answer would be False in almost all cases.

However, with PARTIAL as it is now, there will be few cases where response is TRUE 100%, so we may have PARTIAL answers in most cases.

alpaycetin74 commented 11 months ago

If I got it right @alpaycetin74 the 'formulae' to calculate the matchRate should be a bit more complex - something like this: matchRate (between 0-1)= minimum [(intersection surface/requested surface), (intersection surface/networked checked surface)].

I tend to think API performance is more related to the intersection area/network provided area ratio. If the customer defines a requested area that is much larger than the network can provide (assume network can calculate accurately) , the min of the 2 ratios will be intersection / requested, and it will look pessimistic.

Other than that, returning FALSE with 1-matchRate is an interesting approach. Instead of saying "we are slightly sure it matchesé , we say "we are pretty much sure it won't match". It is a nice trick to give the impression we are confident in what we are doing :)

JoachimDahlgren commented 10 months ago

From a pure application developer perspective I cannot help thinking that it would be easier to understand the response if we returned TRUE with a circle where the center is the same as used for the requested area but the radius is set to cover the network provided area. The circle should not be smaller than was provided in the requested area. location response

sfnuser commented 9 months ago

I feel the specification should also document the math behind the calculations on arriving at the result so that the response is uniform across implementations. As we see here in this discussion there is lot of ambiguity and everytime I look at it my interpretation differs. @bigludo7 rightly started putting some math details here and it would be a good idea to document it in the spec for each of the cases.

jlurien commented 9 months ago

Agree that we should have the implementation guidelines documented, but first we have to agree on them. Definitely something required prior to release any v1

Kevsy commented 9 months ago

I think the implementation guidelines document is very useful. But there is a problem: the criteria are worded differently to the YAML definition (02.0-wip). This leads to ambiguity, and at least in one case, a different decision.

Here's a comparison:

Criterion in YAML	Criterion in guidelines	Verification Result
"The network locates the device within the requested area"	"Network Area within Request Area"	TRUE
"The requested area may not match the area where the network locates the device"	"No overlap"	FALSE
"The requested area partially match the area where the network locates the device"	"Request Area within Network Area Low network accuracy" or "Requested accuracy similar to Network Accuracy" or "Partial overlap Low network accuracy" or "Partial overlap High network accuracy"	PARTIAL
"The network may not be able to locate the device"	"Network Area is not known"	UNKNOWN

Two concerns:

The scenario pictured here will return TRUE according to the YAML but PARTIAL according to the guidelines:

The YAML says:

The network locates the device within the requested area, the verification result is TRUE.

Since the request area is within the Network location area, the answer is TRUE – because the actual device location within the Request Area will always be part of - or 'within' - the superset of all locations in the Network location area.

But the guidelines document says:

“Request Area within Network Area” as “PARTIAL”.

“Request Area within Network Area” is the scenario in the diagram above., so according to guidelines, the answer is PARTIAL.

Hence we have a different answer depending on the wording of the criterion (YAML vs Guidelines), and until both documents use the same wording we have ambiguity.

The YAML criterion for FALSE is itself ambiguous

"The requested area may not match the area where the network locates the device" - "may not" is not definitive. "The requested area is outside the area where the network locates the device" is clearer

Recommendation: we use one set of criteria for both the guidelines and the YAML, and include the guideline illustrations in the API document.

alpaycetin74 commented 9 months ago

Hello @Kevsy , the wording of the definitions in the release candidate is a bit different: https://github.com/camaraproject/DeviceLocation/pull/104 But I understand those may not be perfect, either. I'll try to propose a different wording.

jlurien commented 9 months ago

Thanks @Kevsy, as @alpaycetin74 mention, the wording in the spec has been (hopefully) improved in the latest PR. Regarding your specific comments, please see inline:

I think the implementation guidelines document is very useful. But there is a problem: the criteria are worded differently to the YAML definition (02.0-wip). This leads to ambiguity, and at least in one case, a different decision.

Here's a comparison:

Criterion in YAML Criterion in guidelines Verification Result "The network locates the device within the requested area" "Network Area within Request Area" TRUE "The requested area may not match the area where the network locates the device" "No overlap" FALSE "The requested area partially match the area where the network locates the device" "Request Area within Network Area Low network accuracy" or "Requested accuracy similar to Network Accuracy" or "Partial overlap Low network accuracy" or "Partial overlap High network accuracy" PARTIAL "The network may not be able to locate the device" "Network Area is not known" UNKNOWN Two concerns:

The scenario pictured here will return TRUE according to the YAML but PARTIAL according to the guidelines:

The YAML says:

The network locates the device within the requested area, the verification result is TRUE.

Since the request area is within the Network location area, the answer is TRUE – because the actual device location within the Request Area will always be part of - or 'within' - the superset of all locations in the Network location area.

But the guidelines document says:

“Request Area within Network Area” as “PARTIAL”.

“Request Area within Network Area” is the scenario in the diagram above., so according to guidelines, the answer is PARTIAL.

Hence we have a different answer depending on the wording of the criterion (YAML vs Guidelines), and until both documents use the same wording we have ambiguity.

We may need to clarify the wording if it leads to confusion, but the scenario in the picture should be PARTIAL. The yaml says for TRUE: "the network locates the device within the requested area", which means that the "network area" (= the area where the network locates the device) is within the "requested area", so it is wrong to assume that this is "Since the request area is within the Network location area,", cause is the opposite. It should not be TRUE, but PARTIAL, because the operator cannot assure that the device is exactly within the requested area, because the network area is bigger.

The YAML criterion for FALSE is itself ambiguous

"The requested area may not match the area where the network locates the device" - "may not" is not definitive. "The requested area is outside the area where the network locates the device" is clearer

Recommendation: we use one set of criteria for both the guidelines and the YAML, and include the guideline illustrations in the API document.

The latest version of location-verification uses the wording:

When the requested area does not match the area where the network locates the device. the verification result is FALSE .

We may change that to "is outside" if it is more clear. As an AP from the last meeting, we are going to prepare a document for the implementation guidelines, trying to be more descriptive. Any suggestion to make it more understandable is very welcome, specially from native speakers.

Kevsy commented 9 months ago

Thanks - yes, my point was that the (main branch) wording led to a different interpretation from the Guidelines.

It may help to have a formal declaration at the end of the YAML, after the 'plain language' definition, e.g.

Let R = the set of all possible locations within the Requested Area Let N = the set of all possible locations within the Network Area

The following conditions lead to the verification results stated:

Notation	Description	Verification result
R ⊆ N	R is a subset of, or equal to, N	TRUE
R ⊃ N	R is a superset of N	PARTIAL
R ∩ N and R ⊄ N	R and N intersect but R is not a subset of N	PARTIAL
R ∩ N = ∅	R and N are disjoint	FALSE

bigludo7 commented 9 months ago

@Kevsy and for PARTIAL result we use matchRate = Intersection (R,N) / N ? correct ?

Kevsy commented 9 months ago

@bigludo7

and for PARTIAL result we use matchRate = Intersection (R,N) / N ? correct ?

yes - see the earlier illustration from @alpaycetin74

alpaycetin74 commented 9 months ago

@bigludo7

and for PARTIAL result we use matchRate = Intersection (R,N) / N ? correct ?

yes - see the earlier illustration from @alpaycetin74

Hello, that was my opinion, but we had not reached a consensus back then. There is no description in the rc spec about calculating the matchRate yet.

jlurien commented 9 months ago

Yes, that formula is also what is proposed in the document attached to this issue (matchRate (%) = Intersection Area / Network Area). In the last meeting we concluded that it may not be perfect but it is the one with more consensus so far, so it is convenient to reflect this officially in order to have consistent implementations. If we design a better formula in the future we may adopt it.

jlurien commented 9 months ago

Thanks - yes, my point was that the (main branch) wording led to a different interpretation from the Guidelines.

It may help to have a formal declaration at the end of the YAML, after the 'plain language' definition, e.g.

Let R = the set of all possible locations within the Requested Area Let N = the set of all possible locations within the Network Area

The following conditions lead to the verification results stated:

Notation Description Verification result R ⊆ N R is a subset of, or equal to, N TRUE R ⊃ N R is a superset of N PARTIAL R ∩ N and R ⊄ N R and N intersect but R is not a subset of N PARTIAL R ∩ N = ∅ R and N are disjoint FALSE

Thanks. We recently rephrased the explanations in the spec, based on the suggestions by @alpaycetin74

Marcus-MMJ commented 9 months ago

Thanks - yes, my point was that the (main branch) wording led to a different interpretation from the Guidelines. It may help to have a formal declaration at the end of the YAML, after the 'plain language' definition, e.g. Let R = the set of all possible locations within the Requested Area Let N = the set of all possible locations within the Network Area The following conditions lead to the verification results stated: Notation Description Verification result R ⊆ N R is a subset of, or equal to, N TRUE R ⊃ N R is a superset of N PARTIAL R ∩ N and R ⊄ N R and N intersect but R is not a subset of N PARTIAL R ∩ N = ∅ R and N are disjoint FALSE

Thanks. We recently rephrased the explanations in the spec, based on the suggestions by @alpaycetin74

I think a formal description is really helpful. But I think as already agreed in issue#20, the first two are exactly switched:

Kevsy commented 8 months ago

@Marcus-MMJ Good catch! Thanks for the fix :)

jlurien commented 8 months ago

I think that the current phrasing in the PR consolidates all the discussion. Please take a final look so everything's fine. Thanks for the constructive feedback.

maxl2287 commented 2 months ago

In the case of:

R is a subset of N (R ⊂ N)

So that the Network is fully including the requested Area, what would than be the matchRate ? (based on (R ∩ N) / N * 100) As of now it would then be:

 {
  "verificationResult": "PARTIAL",
  "matchRate": 100,
  "lastLocationTime": "2023-09-07T10:40:52Z"
}

But that does not make really sense as the device can be located on the "other side" of the circle. So you cannot guerentee a 100% matchRate.

wdyt @jlurien ?

bigludo7 commented 2 months ago

But in this cas @maxl2287 at the question: Does the device is in the R area we could answer yes with a 100% certitude right? so for me matchRate=100 make sense.

maxl2287 commented 2 months ago

Let's say the network provides a huge radius (like a cell) where the device is located. The verification-area is much smaller.

N = Location Area of the Device provided by the Network V = requested verification - area Red crosses = possible real location where the device is located

(1.) - here the device is located exactly in the verification area = 100% (2.) & (3.) - It could also be that the device is located elsewhere inside the network-area but not exactle where the verification-area is. So how can we say that we have here a 100% match rate, when there is a possibility of not having the device in the verification-area?

bigludo7 commented 2 months ago

Sorry @maxl2287 I'v mistunderstood

If the Requested Area is larger than the Network area where we "find" the device - and this Network Area is fully within the Requested Area here we answer true (meaning matchRate=100%)
If the Requested area is smaller than the network area - like you schema we have partial and the match rate will the union surface / N Back to your example, for me if you 're asking for V for devices 1, 2 and 3 and your network indicated that devices 1, 2, 3 are in the cell covering N you will have exactly the same result for the 3: PARTIAL and matchRate: 3% (my guess on surface V/surface N)

jlurien commented 2 months ago

It's as @bigludo7 explains. Regarding your original assumption, (R ∩ N) / N * 100), in this case if R is totally within N, then (R ∩ N) is R, so matchRate is ratio R / N, being R <<< N --> matchRate <<< 1

camaraproject / DeviceLocation

Location Verification Implementation Guidelines #85