OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
28.78k stars 9.07k forks source link

Support for Sensitive/PII/Personal Data #2190

Open galvo opened 4 years ago

galvo commented 4 years ago

It would be useful if you could tag schema properties and parameters as being sensitive or PII specific so that these could be tagged appropriately in API docs.

hkosova commented 4 years ago

You could use an extension such as x-pii: true.

galvo commented 4 years ago

That's the exact extension we are going with actually but would be ideal to have it incorporated in the spec and to standardise on it so that all API doc generators handle it the same way i.e. swagger, widdershins, readme, apigee etc

MikeRalphson commented 4 years ago

A good candidate for a ui / forms / documentation vocabulary to extend JSON Schema.

galvo commented 4 years ago

@MikeRalphson does such a vocabulary exist or is there a suitable place to get one started?

handrews commented 4 years ago

@galvo formal extension vocabularies will be a thing in OAS 3.1 (with JSON Schema draft 2019-09 or later). So there aren't any yet but there is a lot of interest.

We've designated https://github.com/json-schema-org/json-schema-vocabularies/issues as a place to hold ideas, although the JSON Schema organization will not be acting on those directly. It is more of a clearinghouse for others to figure out what might be useful.

It is not necessary for a vocabulary to be proposed there, it's just where we moved all of the keyword proposals from the JSON Schema spec repo that weren't going into the core or validation spec.

lpicquet commented 2 years ago

It would be good for sensitive information not to be part of what gets logged also - notably not to be considered in the toString method of java implementations for example

Barna1234 commented 2 years ago

It would be good for sensitive information not to be part of what gets logged also - notably not to be considered in the toString method of java implementations for example

has anything happened about this issue? it would be really useful

LasneF commented 7 months ago

I agree this is a usefull feature (especially in EU with all the GDPR regulation) this would allows gateway , and logger to not take decision weither it should be log or not for instance also for documentation sharing with the party using the API is subject to regulation for instance as manipulating sensitive data

notice it can be tricky as PII definition can vary from a regulation to another ...

also in some case a unitary field is not pii but the combination of is .

still having simple flag true/false can handle most of the case usually

handrews commented 4 months ago

We already use format: password for obscuring things in UI. A format: pii that applies to any data type (because anything, including whole objects or arrays, might be sensitive) would be an easy solution and could go in the format registry (which can be updated at any time) instead of the spec itself.

Would this work? If so, if someone (@LasneF ?) could open a PR on the regstry in the gh-pages branch of this repo?

AdamCoulterOz commented 4 months ago

We already use format: password for obscuring things in UI. A format: pii that applies to any data type (because anything, including whole objects or arrays, might be sensitive) would be an easy solution and could go in the format registry (which can be updated at any time) instead of the spec itself.

Would this work? If so, if someone (@LasneF ?) could open a PR on the regstry in the gh-pages branch of this repo?

For me it wouldn't, as PII represents a different dimension than format. i.e. any fields which are PII would also have a format (e.g. a customer email address format: email or a date of birth format: date).

LasneF commented 4 months ago

@AdamCoulterOz i agree with your statement, PII is not a format, but more associated to data handling

notice than password , as well is more about data handling than data format

as you mentionned a given a field birthDate can be set as format date , but can be considered as pii , but dropping format date to format password looks wrong

@handrews , in fact there is a gap around data handling information , how a consumer should handle a data (that is distinct from the format) , data handling can be around display (like password , but could be also dropdown list or ) or in the pii context can be about storage requirement, and handling (removed from log , not fowarded to 3rd party )

so can be a bigger topic .

notice that PII is a bit difficult to defined and can be set differently among the regulation . to be more foggy the term sensitive-data can be used, it would embrace a larger spectrum

could be either something like x-sentitive : true

or better if we considers extensibitiliy x-data-classification : enum with sentitive , GDPR , CCPA , etc ... yet another registry ?

or just data-classification can be adopted , it might be a topic for @moonwalk project , but for OAS 3.2 adding a basic extension data-classification would great en show case the concern about OAI to data security , data privacy and so on

github-actions[bot] commented 4 months ago

This issue has been labeled with No recent activity because there has been no recent activity. It will be closed if no further activity occurs within 28 days. Please re-open this issue or open a new one after this delay if you need to.

handrews commented 4 months ago

idk why the bot marked this "no recent activity" 2 hours after a new comment was added. I've removed the bot labels.

handrews commented 4 months ago

@LasneF the main problem here is that we're talking about the Schema Object, so this is a JSON Schema extension, not and OpenAPI extension. OpenAPI has retained a few JSON Schema extension keywords in 3.1, but we've generally tried to get out of the business of extending other specs. JSON Schema draft 2020-12 (used in OAS 3.1) allows defining extensions with its own mechanism that does not require OpenAPI involvement, which is preferable. Also, in JSON Schema 2020-12, unknown keywords SHOULD be treated as annotations (like title or readOnly), which is what you'd want for any PII keywords.

LasneF commented 4 months ago

@handrews you are right, that should fall at the end into the JSON schema perimeter , @jdesrosiers would it be something ready to include ?

=> we may follow / create an item in the https://github.com/json-schema-org/json-schema-spec and annoy our friends :)