Revised File Identifier Handling

brian-comply0 commented 2 months ago

User Story

As a tool developer, I want file identifiers to be more reliable and consistent across implementations, so that document references are more universal.

Description

Per a suggestion from @pkothare, file identifiers should:

better align with the OSCAL specification;
be embedded within the OSCAL document; and
be universal across implementations.

The draft specification currently allows each implementation to assign their own unique identifier, and is completely silent on how that identifier is managed, and does not consider the possibility of using the same identifier to reference the same document in multiple different systems.

Further, while the OSCAL standard offers a document identifier field in metadata, this field is not required.

Acceptance Criteria

[ ] The OSCALRESTOpenAPI.json file is updated to reflect this change at the appropriate endpoints.
[ ] the OSCAL REST Open API content is updated to reflect this change
- [ ] Revise identifier explanation
- [ ] Revise endpoint references and identifier references throughout documentation
- [ ] Add usage scenarios

Proposed Solution

The OSCAL REST OpenAPI Specification should be revised such that:

OSCAL-compliant v4 or v5 UUIDs are used for API document identifiers
OSCAL REST OpenAPI clients MUST assign an immutable document identifier to OSCAL content prior to POSTing to an OSCAL REST OpenAPI server
OSCAL REST OpenAPI servers SHOULD/MUST do the following upon receipt of new OSCAL content via the POST method:
- SHOULD make a snapshot of the content in its unaltered form;
- MUST check for an OSCAL.io identifier in the metadata\document-ids array, and honor it as the unique identifier used in future API calls;
- if no OSCAL.io identifier is found, generate a v4 or v5 UUID value and insert it into the metadata\document-ids array as the OSCAL.io identifier.

pkothare commented 2 months ago

if no OSCAL.io identifier is found, generate a v4 or v5 UUID value and insert it into the metadata\document-ids array as the OSCAL.io identifier.

I think this goes back to making an identifier internally unique vs unique across systems. I believe we should embrace a concept of using 2 IDs, similar to the approach described https://www.rfc-editor.org/rfc/rfc7643#section-3.1: id and externalId.

To handle POST requests, we can derive a similar approach based on https://www.rfc-editor.org/rfc/rfc7644#section-3.3., which is the sister standard that defines the protocol for interactions. The statement above should then be revised to:

An internal identifier id is created and inserted into the metadata\document-ids array regardless of what the client has supplied for externalId.

The externalId should be mutable and consistent for the client that is POSTing the document, while the server should always create an id that allows for internal consistency/uniqueness.

Unwinding that approach, the other statements will look more like:

MUST check for an OSCAL.io external identifier in the metadata\document-ids array, and honor it as the unique identifier used in future API calls;

OSCAL REST OpenAPI clients MUST assign an immutable document identifier to OSCAL content prior to POSTing to an OSCAL REST OpenAPI server in the externalId field.

The language above is a little loose, and I used externalId and id because those are the names used in the SCIM standards. Feel free to replace them with more appropriate names for the OSCAL standard.

brian-comply0 commented 2 months ago

@pkothare I follow a little better now. I like this.

This raises two questions: 1. How do we best handle two identifiers from the same identification scheme when the current OSCAL syntax assumes only one identifier per identity scheme?

The current OSCAL syntax supports an array of document-ids that contain a scheme and identifier, where the scheme value is an RFC-3986 URI that identifies an identification scheme. (See the NIST OSCAL URI Description).
- The OSCAL design assumes one identifier per scheme. We have two under a single scheme.
- Before introducing the second identifier, I had intended to use something along the lines of http://oscal.io/oscal/content-identifier. Now we need two.
- We may need to do something along the lines of http://oscal.io/oscal/identifier/contentuuid and http://oscal.io/oscal/identifier/externaluuid. Thoughts?

2. Should a server allow the use of either identifier when both are available?

In other words, if an OSCAL Catalog has both a ContentUUID and an ExternalUUID, should we allow both GET /catalog/[ContentUUID] GET /catalog/[ExternalUUID] to identify the same OSCAL Catalog?

pkothare commented 2 months ago

We may need to do something along the lines of http://oscal.io/oscal/identifier/contentuuid and http://oscal.io/oscal/identifier/externaluuid.

I concur with this approach. Distinguishing by scheme is concise enough to conform to OSCAL standard and provide us with the necessary flexibility to uniquely identify a document internally and externally.

Should a server allow the use of either identifier when both are available?

Borrowing again from https://www.rfc-editor.org/rfc/rfc7644#section-3.4, the GET operations should only be valid against ContentUUID. For example, for the given request:

GET  /catalogs/8b291c95-22da-4ca1-b9ce-6dc782a417b1
Host: example.com
Accept: application/json

The server responds with:

HTTP 1.1 200 OK
Content-Type: application/json
Location: https://example.com/catalogs/8b291c95-22da-4ca1-b9ce-6dc782a417b1
{
    "catalog": {
        "metadata": {
            "document-ids": [
                {
                    "scheme": "http://oscal.io/oscal/identifier/contentuuid",
                    "identifier": "8b291c95-22da-4ca1-b9ce-6dc782a417b1"
                },
                {
                    "scheme": "http://oscal.io/oscal/identifier/externaluuid",
                    "identifier": "..."
                }
            ]
        }
        ...
    }
}

which also means that when creating an OSCAL document (in this case a catalog) for a given request like:

POST /catalogs HTTP/1.1
Host: example.com
Content-Type: application/json
Content-Length: ...

{
    "catalog": {
        "metadata": {
            "document-ids": [
                {
                    "scheme": "http://oscal.io/oscal/identifier/externaluuid",
                    "identifier": "9c0a0e1c-9bdf-4cc4-acda-c32122525406"
                }
            ]
        }
        ...
    }
}

the server should respond with:

HTTP/1.1 201 Created
Content-Type: application/json
Location: https://example.com/catalogs/8b291c95-22da-4ca1-b9ce-6dc782a417b1

{
    "catalog": {
        "metadata": {
            "document-ids": [
                {
                    "scheme": "http://oscal.io/oscal/identifier/contentuuid",
                    "identifier": "8b291c95-22da-4ca1-b9ce-6dc782a417b1"
                },
                {
                    "scheme": "http://oscal.io/oscal/identifier/externaluuid",
                    "identifier": "9c0a0e1c-9bdf-4cc4-acda-c32122525406"
                }
            ]
        }
        ...
    }
}

brian-comply0 commented 2 months ago

@pkothare Understood. I will revise accordingly

cgilbert328 commented 2 months ago

Relates to Issue

brian-comply0 commented 2 months ago

@pkothare after further review and consideration, I see your recommended approach as violating one of our specification design principles.

We have longed maintained that the only requirement for OSCAL content to be passed via the API is that it passes the NIST OSCAL syntax validators. Requiring the assignment of an External ID in an otherwise optional OSCAL filed is a departure from that principle.

I would like to suggest an hybrid solution.

The External Identifier approach described above should be held up as a best practice. It should be honored and used by the server when present in OSCAL content.

However, The server should not reject valid OSCAL content simply because it is missing this identifier. Instead it should accept the content and assign the External ID itself. This includes adding the identifier to the server's copy of the OSCAL content for inclusion when served to other entities.

As mentioned in our verbal discussion, the POST return content includes the server's identifier for other API methods/endpoints. This allows the client to immediately validate if an included External Identifier was recognized and honored or if the server failed to find it and implemented its own identifier.

I realize this is a departure from the System for Cross-Domain Identity Management; however, I believe the sharing of OSCAL content is a sufficiently different use case with different design principles. Allowing an honoring the External ID when present addresses your concern, while not requiring it allows us to continue honoring the goal of accepting any valid OSCAL content.

pkothare commented 2 months ago

@brian-comply0

Requiring the assignment of an External ID in an otherwise optional OSCAL filed is a departure from that principle.

That's fair, I incorrectly made the statement: OSCAL REST OpenAPI clients MUST assign an immutable document identifier to OSCAL content prior to POSTing to an OSCAL REST OpenAPI server in the externalId field.

It should have been stated as something along the lines of: OSCAL REST OpenAPI clients ~MUST~ SHOULD assign an immutable document identifier to OSCAL content prior to POSTing to an OSCAL REST OpenAPI server in the externalId field, and the externalId MUST always be issued by the client (if supplied).

The External Identifier approach described above should be held up as a best practice. It should be honored and used by the server when present in OSCAL content.

Agree, which changes the modal verb from MUST to SHOULD, per RFC 2119.

The server should not reject valid OSCAL content simply because it is missing this identifier.

Agree. As you stated, it's best practice but there is no hard requirement.

Instead it should accept the content and assign the External ID itself.

I don't believe this would work. By virtue of generating the External ID on the server, there may be collisions on the client's side. Moreover, it would couple the client's and server's implementation of an External ID. Lastly, if the server were to generate an External ID, the solution of using both an External ID and Content ID would not be succinct.

We should simply accept an External ID if supplied. The Content ID is sufficient to identify a document from the server's perspective. The client may choose to use the Content ID to establish uniqueness within it's boundary, but that's really up to the client. The External ID should be treated as an opaque identifier provided by the client, i.e. it has no meaning to the server, it just holds it as extra information on behalf of the client.

As mentioned in our verbal discussion, the POST return content includes the server's identifier for other API methods/endpoints. This allows the client to immediately validate if an included External Identifier was recognized and honored or if the server failed to find it and implemented its own identifier.

In accordance with what was mentioned above and your recommendation of making the External ID optional, I believe the following interaction should be valid. For the given request:

POST /catalogs HTTP/1.1
Host: example.com
Content-Type: application/json
Content-Length: ...

{
    "catalog": {
        "metadata": {
                // no External ID or Content ID
            ]
        }
        ...
    }
}

the server should respond with:

HTTP/1.1 201 Created
Content-Type: application/json
Location: https://example.com/catalogs/8b291c95-22da-4ca1-b9ce-6dc782a417b1

{
    "catalog": {
        "metadata": {
            "document-ids": [
                {
                    "scheme": "http://oscal.io/oscal/identifier/contentuuid",
                    "identifier": "8b291c95-22da-4ca1-b9ce-6dc782a417b1"
                }
            ]
        }
        ...
    }
}

I realize this is a departure from the System for Cross-Domain Identity Management; however, I believe the sharing of OSCAL content is a sufficiently different use case with different design principles. Allowing an honoring the External ID when present addresses your concern, while not requiring it allows us to continue honoring the goal of accepting any valid OSCAL content.

In fact, the SCIM does not require it either. The language they use for externalId is as follows:

externalId A String that is an identifier for the resource as defined by the provisioning client. The "externalId" may simplify identification of a resource between the provisioning client and the service provider by allowing the client to use a filter to locate the resource with an identifier from the provisioning domain, obviating the need to store a local mapping between the provisioning domain's identifier of the resource and the identifier used by the service provider. Each resource MAY include a non-empty "externalId" value. The value of the "externalId" attribute is always issued by the provisioning client and MUST NOT be specified by the service provider. The service provider MUST always interpret the externalId as scoped to the provisioning domain. While the server does not enforce uniqueness, it is assumed that the value's uniqueness is controlled by the client setting the value.

Taken from: https://www.rfc-editor.org/rfc/rfc7643#section-3.1

pjavan commented 2 months ago

@brian-comply0 is this issue sounds like a topic we should propose on the https://github.com/GSA/fedramp-automation repository. Thoughts?

brian-comply0 commented 2 months ago

@pjavan are you saying we should propose the PMO define what identifiers they expect in the content they receive?

brian-comply0 commented 2 months ago

@pkothare glad we are on the same page on "must" vs. "should". I guess I'm still confused on one point:

We'd encourage, but not require clients to assign an External ID before POSTing.
Whether the client assigns an External ID or not, the server would assign a Content ID.
If the client assigns an External ID, we'd require all clients to use that External ID in GET, PUT, DELETE actions.
If no External ID is present, we would have to require clients to use the Content ID for GET, PUT, DELETE actions.

Do I follow what your response correctly?

pjavan commented 2 months ago

@pjavan are you saying we should propose the PMO define what identifiers they expect in the content they receive?

Assuming oscal file exchange will create reference problems for FedRAMP, I'm wondering if the PMO has had to tackle this problem yet and if they have a preferred approach developed.

brian-comply0 commented 2 months ago

Even in legacy packages they assign a Package ID. One per system. They require it to be in the SSP, AP, AR, and POA&M for a given system. It also appears on the Marketplace. When I wrote the FedRAMP OSCAL guides, I described this as being required content in the /metadata/document-ids array. I can't speak to what they may or may not have done after I left.

I'd prefer to socialize it with Dave before posting something to their repo.

pkothare commented 2 months ago

We'd encourage, but not require clients to assign an External ID before POSTing.

Agreed

Whether the client assigns an External ID or not, the server would assign a Content ID.

Agreed

If the client assigns an External ID, we'd require all clients to use that External ID in GET, PUT, DELETE actions.

For now, only the Content ID can be used in GET, PUT and DELETE actions. The External ID may be used in GET actions in the future if we want to support actions like GET /catalogs?$filter=externalId eq ..., but that's out of scope for this discussion. The External ID cannot be used to uniquely identify a single document entity on the server. Therefore the client MUST use:

PUT /<document-type>/<contentuuid>
DELETE /<document-type>/<contentuuid>

If no External ID is present, we would have to require clients to use the Content ID for GET, PUT, DELETE actions.

Yes, as stated above, from the server's perspective, the Content ID provides uniqueness for operations conducted against a single entity and MUST be used by the client to identify resources on the server; the External ID has no bearing on the server. If a server responds with a document that includes an External ID for a given GET request by the client, then the client may choose to use it at it's own discretion, because the client supplied it during POST request at some earlier point in time.

brian-comply0 commented 2 months ago

@pkothare perfect! Thank you for clarifying!

EasyDynamics / oscal-rest