camaraproject / WorkingGroups

Archived repository used previously by CAMARA Working Groups.
https://wiki.camaraproject.org/display/CAM/CAMARA+Working+Groups
42 stars 60 forks source link

No personal information as common API design criteria #101

Closed jordonezlucena closed 1 year ago

jordonezlucena commented 1 year ago

CAMARA APIs can NOT be designed including (as input or output) personal information (e.g. phone number) to identify the user. In case personal information is needed for API usage, it is required to use anonymised/obfuscated identifiers instead.

Problem statement and proposal (way forward) is captured here

eric-murray commented 1 year ago

I need more detail on the proposal. It appears to be that we replace the current approach of identifying the target subscription directly in the API body (by External Id, MSISDN or IP address) with a 2-step process whereby the API caller first requests an anonymised subscriber identifier using a new dedicated API and then passes that identifier to the API they really want to call.

I can see that can work, but the API caller still has to pass these subscriber identifiers to an API (to get the anonymised identifier), so I don't really see how it avoids the problem of an API caller requiring to pass the subscriber identifiers that they know (such as MSISDN) to an API.

You might argue that they only need to do this once, and can then remember the association between the subscriber and the anonymised identifier. But some identifiers (such as IPv4 address) are dynamic, and MSISDNs can be transferred to other subscribers thus invalidating the anonymised identifier. And maybe the API caller doesn't want to change their IT systems so that they can store this new identifier.

So my expectation would be that many API callers would use the 2-step process every time, and the issue of API callers passing PII in an API call would not be avoided at all.

Also, the presentation talks about Telco Operators "sharing" this information with the API callers, but for all the APIs under discussion, this PII is known to the API caller and is passed to the Telco Operator in the API call. So the only PII being shared is information that the API caller already knows. It is unfortunate that the most sensitive identifier (MSISDN) is also the one that is most commonly shared by the subscriber themselves to 3rd parties, but that's the way it is.

patrice-conil commented 1 year ago

Where we have to be very careful is to ensure that we do not return more personal information than what has been provided to us.

jlurien commented 1 year ago

Thanks @eric-murray for your feedback.

I need more detail on the proposal. It appears to be that we replace the current approach of identifying the target subscription directly in the API body (by External Id, MSISDN or IP address) with a 2-step process whereby the API caller first requests an anonymised subscriber identifier using a new dedicated API and then passes that identifier to the API they really want to call.

Yes, idea is to have a previous and separate step to get this anonymised subscriber id.

I can see that can work, but the API caller still has to pass these subscriber identifiers to an API (to get the anonymised identifier), so I don't really see how it avoids the problem of an API caller requiring to pass the subscriber identifiers that they know (such as MSISDN) to an API.

The scenario we want to avoid is that an API caller who doesn't know certain PI data, such as a MSISDN, has to get it or ask for it in order to invoke an API. For example, a Market Place using CAMARA APIs. We propose that no API requires a MSISDN as input in order to work, because in that case the Caller will have to know it. For that, our proposal is to force these intermediaries to get the anonymised subscriber id from other information that is available to them, such as the IP and port of a user device connected to the Mobile Network. This step would be done within the AuthN/AuthZ process required to manage consents and get the access_token, prior to the use of any API, so this information does not need to be shared again in the APIs

You might argue that they only need to do this once, and can then remember the association between the subscriber and the anonymised identifier. But some identifiers (such as IPv4 address) are dynamic, and MSISDNs can be transferred to other subscribers thus invalidating the anonymised identifier.

Once the anonymised subscriber id is generated, the original IP that was used to generate it can change but the anonymised identifier would still be valid. MSISDNs can indeed be recycled between customers, even not so dynamically. A client may have a MSISDN that is already recycled without knowing, and having an alternate id in this case is even better, because this alternate id can expire within the reassignment period and client is required to get a new one, or it could be revoked in other scenarios that could make it no longer valid (change of tariff, fraud detection, etc).

And maybe the API caller doesn't want to change their IT systems so that they can store this new identifier. So my expectation would be that many API callers would use the 2-step process every time, and the issue of API callers passing PII in an API call would not be avoided at all.

There may be resistance from certain clients to use an API in this way, but Telcos should protect their assets and customer information.

Also, the presentation talks about Telco Operators "sharing" this information with the API callers, but for all the APIs under discussion, this PII is known to the API caller and is passed to the Telco Operator in the API call. So the only PII being shared is information that the API caller already knows. It is unfortunate that the most sensitive identifier (MSISDN) is also the one that is most commonly shared by the subscriber themselves to 3rd parties, but that's the way it is.

We cannot assume that in all cases and for every API the caller already knows the PI. We are already dealing with integrations with third parties for APIs, like those involving Market Places, where the caller of the API does not know some PI, like MSISDNs. The anonymised id would allow to use the APIs in the same way in all diverse scenarios, while keeping the sensible data within the operator.

jordonezlucena commented 1 year ago

A summary of the discussions we had last in the last Commonalities CC.

David Wroblewski, DT: We see the proposal quite similar to the PCR usage in Mobile Connect. We realised that there are large enterprise (B2B) customers which already have the MSISDN, and they want to use it.

David Wroblewski, DT: Don't be afraid of MNOs sending the MSISDN. If the come directly to access GW, why to be afraid?

David Wroblewski, DT: Because of privacy and personal information handling rules captured in GDPR, in Mobile Connect they do not include the information as output params (they use booleans, timestamps,,... instead) through they allow this information as input params. The PCR is also personal information,

David Wroblewski, DT: He mentions scenarios involving family contracts.

Shilpa, DT: Fine with the proposal, but she doesn't see the need to only restrict the proposal to this scenario. She also think that if the B2B customer does already have the MSISDN, let it be used.

Sylvain, DT: PCR already exists. Good for "mass market" and small developers, but doesn't work well for banking institutions, which already have the MSISDN. He also mentions that it is unlikely that a banking institution wants a marketplace in the middle, mediating interactions between banking institution and the MNO. Orange commits to study the proposal, and check if _anonymous_subscribedid can coexist with MSISDN or not. Legal implications are important here, and thus deserve to be studied as well.

Eric Murray, Vodafone: Same comment as noted before. If a B2B customer already has the MSISDN, they will then be forced to make an extra association between the MSISDN and the _anonymous_subscriberid. They will be reluctant not to use the MSISDN.

Eric Murray, Vodafone:How to manage scenarios without data connectivity (there is no IP), this is only valid for IP Auth.

eric-murray commented 1 year ago

@jlurien So the anonymised identifiers will expire or can be revoked, and thus need to be periodically "refreshed"? This makes me think you are considering this as a use case for OIDC. Is it related to the proposal on user consent? If so, it might be better to combine them into a single proposal unless they are genuinely independent (i.e. CAMARA could adopt either or both without modification to either scheme).

@jordonezlucena I would prefer that the option to generate an anonymised identifier from MSISDN be retained, perhaps through a dedicated API where access could be restricted to those API clients who can show they need this functionality. But I agree that IP address / port or 3GPP External Identifier (i.e. GPSI) is a better option for those API clients who know or could determine it.

Related to user consent, if the API client only knows the MSISDN because the end user gave it to them (either verbally or through CLI) then I believe the API caller should still be permitted to request the anonymised subscriber identifier based on MSISDN without additional end user interaction if they agree contractually that they will record the consent (for audit purposes if consent is challenged) and bear any costs of invalid consent.

So a mechanism is required that allows this without requiring further input or action on the part of the end user, even if use of this mechanism is limited to certain API clients. If the end user needs to retrieve a OTP and communicate that to the API client, that could add a lot of friction to the transaction that would limit the usefulness of the APIs.

shilpa-padgaonkar commented 1 year ago

@jlurien : If privacy is the main reason to use this new suggested identifier, why is GPSI external identifier not used instead of bringing in yet another identifier? https://github.com/camaraproject/WorkingGroups/blob/main/Commonalities/documentation/UE-Identification.md

jpengar commented 1 year ago

@jlurien : If privacy is the main reason to use this new suggested identifier, why is GPSI external identifier not used instead of bringing in yet another identifier? https://github.com/camaraproject/WorkingGroups/blob/main/Commonalities/documentation/UE-Identification.md

@shilpa-padgaonkar Actually, we don't see GPSI external identifier as something incompatible with our anonymous id proposal. The concept behind it is quite similar, obtaining an id not including personal information by means of the UE IP address. However, we think that this id should not be restricted to the GPSI in terms of implementation, so we are proposing to have a simple REST API aligned with CAMARA spirit being developer-friendly and agnostic of the underlying implementation details. See linked sample. Wee see the GPSI external id as a solution tied to the implementation of the corresponding 3GPP capabilities in the operator side. GPSI external id is introduced in 3GPP Rel 17 and as far as we know, it is not widespread yet among vendors/operators. And also we consider that 3GPP APIs are not as friendly as may be required to be directly exposed as-is to developers. These are reasons that make us think that in a potential CAMARA API, GPSI external id would not scope the northbound side (what is exposed to the developers), but southbound side (not visible to developers) instead. Basically we could offer to developers a simple API to obtain an anonymous id, and the Operator could implement it using GPSI external id or another alternative approach. And this way also, the solution would not be tied to the availability of some specific 3GPP release or capability in Operator side.

Kevsy commented 1 year ago

We see the GPSI external id as a solution tied to the implementation of the corresponding 3GPP capabilities in the operator side

I don't think use of the GPSI external Id format has to be tied to an R17 implementation, because it's just a datatype.

We could adopt the External Identifier format of the GPSI now as a means to (1) represent an anonymous identifier for the user, meaningful only to the issuing operator and (2) identify the issuing operator, which then makes routing easier if the API cannot be fulfilled by the receiving operator.

Adopting just the external identifier format <local identifier>@<domain operator> at this stage means it can be supported by 4G and fixed cores when issuing an anonymous customer reference, and is 'future proofed' for 5G core R17 where it has native support. I think this meets your last paragraph above:

Basically we could offer to developers a simple API to obtain an anonymous id, and the Operator could implement it using GPSI external id or another alternative approach. And this way also, the solution would not be tied to the availability of some specific 3GPP release or capability in Operator side.

jpengar commented 1 year ago

I don't think use of the GPSI external Id format has to be tied to an R17 implementation, because it's just a datatype.

We could adopt the External Identifier format of the GPSI now as a means to (1) represent an anonymous identifier for the user, meaningful only to the issuing operator and (2) identify the issuing operator, which then makes routing easier if the API cannot be fulfilled by the receiving operator.

Adopting just the external identifier format @ at this stage means it can be supported by 4G and fixed cores when issuing an anonymous customer reference, and is 'future proofed' for 5G core R17 where it has native support.

I understand your point, initial proposal was to left the anonymous id event more generic with just as a field with string format. So each operator could actually choose whatever id suits them best, like an opaque uuid, or whatever. And not necessarily return a GPSI formatted id (which would be fine if the Operator wants to do so).

MarkCornall commented 1 year ago

I would believe that a GPSI type "external Identifier@domain" would be the most useful and allow both identity and correct routing of API calls

shilpa-padgaonkar commented 1 year ago

Overview:

  1. Existing identifiers can be kept as some players /app-service-providers already might know other identifiers like msidn and they will continue using it, though we don't recommend it in general.
  2. IP-address is still an option that can be used for identifiers. This is again not a recommended option for Camara and providers are recommended to discontinue its use as soon as possible.
  3. GPSI type ext id is the overall recommended Generic ID to be used going ahead.
shilpa-padgaonkar commented 1 year ago

Each API we work on will need to consider additional requirements if any to decide which identifiers makes the most sense.

shilpa-padgaonkar commented 1 year ago

As agreed in commonalities meeting on 12th Jan, closing this issue. Feel free to reopen in case you come across an unaddressed point.