CycloneDX / transparency-exchange-api

A standard API specification for exchanging supply chain artifacts and intelligence
https://tc54.org/
Apache License 2.0
59 stars 9 forks source link

SPEC: `GET /product` What query parameters should be supported? #80

Open madpah opened 2 days ago

madpah commented 2 days ago

From https://github.com/CycloneDX/transparency-exchange-api/pull/77/files#r1850138397

@madpah:

Should we not have a more vague API here? The intent is to return a list of TEA Products that match some identifier. What if some supplies a value for each parameter (ignore pagination)? I'd collapse purl, ean, sku and vendor-uuid into a single product-identifier parameter.

@vpetersson:

To me these are different query methods. If you have the product identifier, you can hit /product/, but if you have one of the other it's more like a query. If we treat them as a single item, it's more like a generic search query rather than a search on a given key. While of course possible, it will complicate the implementation and might cause unexpected results (e.g. if you provide a partial uuid/sku, you might not get a unique result).

@madpah:

So from the implementation side I see your point - but from the consumer side, I might not know what type the product-identifier I have is.

So I might now know how to query this API...

vpetersson commented 2 days ago

Yeah this is a good discussion. So here's how I envision the workflow to look like:

While this might not be an easy 'browseable' way to discover the artifacts, it would be relatively straight to do it programmatically. The key here is that we need to keep it universal. That's why I would suggest that the product-identifier is a server-side generated UUID, and everything else is metadata. That way you can discover a product using multiple approaches (e.g. purl, sku etc).

For instance, the above workflow works equally well with say a purl, where you just use purl=zyz.

oej commented 2 days ago

I think from the consumer side, they have a TEI and just want to reference that to find the product. Having the client understand different TEI types makes it too complex. Basically just treat the stuff after the host name as a string. The type is for syntax checking when needed and helpful.

vpetersson commented 2 days ago

The question is really where the TEI originates. In the current API spec it's a server-side generated UUID. In your vision, is the TEI provided by the user when the product is created?

madpah commented 2 days ago

Let's be clear: a TEI is the entire URN - urn:tei:cyclonedx.org:SHA256:fd44efd601f651c8865acf0dfeacb0df19a2b50ec69ead0262096fd2f67197b9

So the product-identifier in question would be fd44efd601f651c8865acf0dfeacb0df19a2b50ec69ead0262096fd2f67197b9

For the GET /product, from a Consumers perspective, they can easily supply either the full TEI or the product-identifier.

Now, TEI Specification does define type for the product-identifier - SHA256 in the above example, so, it would be possible for the Consumer to specify the type and the product-identifier...

madpah commented 2 days ago

So IMO, we have 3 options:

  1. GET /product?product-identifier=<product-identifier>
  2. GET /product?type=<type>&product-identifier=<product-identifier>
  3. GET /product?tei=<tei>
vpetersson commented 2 days ago

I honestly missed the document you're referring to @madpah. Good pointer.

Having said that, I'm really confused by the SHA256 here. Is this a SHA of a collection object (e.g. a given SBOM)?

Other than that, I don't have a strong opinion if we're using GET parameters or paths (e.g. /product/?product-identifier=foo or /product/foo. I suppose going for a GET parameter based approach does provide more flexibility.

ppkarwasz commented 2 days ago

I would prefer to use path segments instead of GET parameters, so a TEA server might be implemented using a static website. If you have only a few releases of an Open Source product, you might generate the data once and store it in a Git repo.

GET /product/<type>:<product-identifier>

sounds better to me.

Edit: Replaced <type>/<product-identifier> with <type>:<product-identifier>.

vpetersson commented 2 days ago

I like the idea of a static version. That's a good argument for paths to me.

oej commented 1 day ago

Do you see any usage for the TYPE in the API? A single product can have many TEIs both of the same type and different types. I think we should handle them as opaque strings, just identifiers, in the API.

oej commented 1 day ago

The question is really where the TEI originates. In the current API spec it's a server-side generated UUID. In your vision, is the TEI provided by the user when the product is created?

Remember that a product can have multiple TEIs. The UUID of the product item is an identifier that is used after TEI resolution. Don't mix the UUID tei with the TEI in the product object. It can be the same, but doesn't have to be.

The TEI is either created somewhere else and registred in the TEA database or created in the TEA service. I suspect it to be both. If I want to use my own article numbers in my ordering system, those IDs are created in that system. We likely want an API endpoint to manage the TEIs for a given product.

ppkarwasz commented 1 day ago

Do you see any usage for the TYPE in the API? A single product can have many TEIs both of the same type and different types. I think we should handle them as opaque strings, just identifiers, in the API.

I edited the comment to prepend the <product-identifier> with the type.

vpetersson commented 1 day ago

The question is really where the TEI originates. In the current API spec it's a server-side generated UUID. In your vision, is the TEI provided by the user when the product is created?

Remember that a product can have multiple TEIs. The UUID of the product item is an identifier that is used after TEI resolution. Don't mix the UUID tei with the TEI in the product object. It can be the same, but doesn't have to be.

The TEI is either created somewhere else and registred in the TEA database or created in the TEA service. I suspect it to be both. If I want to use my own article numbers in my ordering system, those IDs are created in that system. We likely want an API endpoint to manage the TEIs for a given product.

It would certainly be possible to allow for the UUID to be provided by the client, as long as there are server-side checks to ensure it's unique.

My suggestion would be that the UUID is always generated server-side, and then you can attach metadata to this. For instance, you might want to have both a SKU and barcode, as well as a UUID. This is how the current API draft has been designed:

Image

In the real world, I don't think the UUID will be exposed to end users very frequently. You'd probably want to use one of the many other supported, more use friendly ways to identify your product (sku, barcode, purl etc).

oej commented 1 day ago

I think you are mixing the UUID for the product index object and the one used in the TEI. Those are two different entities (but can be the same). I don't want to limit a manufacturer to use UUID from another system in the TEI.

I still think the way you have added stuff like "barcode" etc is wrong and not extensible. It needs to be simplified.

vpetersson commented 1 day ago

Do you see any usage for the TYPE in the API? A single product can have many TEIs both of the same type and different types. I think we should handle them as opaque strings, just identifiers, in the API.

See comment above, but in the current implementation, the various TYPEs are implemented as metadata (e.g. you can query with /product?barcode=foobar)

oej commented 1 day ago

I still think that would lead to a massive API that always changes. We have to find another model for the metadata and not have it in the structure like that. Maybe key/value list if you persist in being able to query on PURL or other type values.

vpetersson commented 1 day ago

I think you are mixing the UUID for the product index object and the one used in the TEI. Those are two different entities (but can be the same). I don't want to limit a manufacturer to use UUID from another system in the TEI.

Sure, we can make this user provided as an option -- but it still needs to be enforced to be unique.

I still think that would lead to a massive API that always changes. We have to find another model for the metadata and not have it in the structure like that. Maybe key/value list if you persist in being able to query on PURL or other type values.

Aren't these keys derived from the tei-types anyways and thus are per-defined?

oej commented 1 day ago

The UUID generated by the system for the product object needs to be enforced to be unique in the system, like the leaf, collection and other objects. But a single product can have multiple TEIs with different UUIDs. THat's another name space. We should not mix them.

If you create a field called "bar code" you are assuming exactly one bar code per product, which I think is wrong. And you have to keep defining new fields for every single type we define. I would like as much as possible to decouple defining TEI Types from the API. From the API point of view it's basically a string, an identifier without any other meaning.

If you want to be able to provide a lookup I suggest we create a key-value pair array with a key of "type" and then the "tei" as a value without parsing it further. The query would be ?type=hash&value=sha256:234234234234234234 and similar queries then. That would make the types transparent in the code, but still reachable for queries.

ppkarwasz commented 1 day ago

I just realized, we are talking about two different endpoints here:

oej commented 1 day ago

We have to separate a query for the product object UUID from a TEI of type UUID with a specific value. Those are two different things.

vpetersson commented 1 day ago

If you create a field called "bar code" you are assuming exactly one bar code per product, which I think is wrong.

That's easily rectified by turning it into a dict. I have no to that.

And you have to keep defining new fields for every single type we define. I would like as much as possible to decouple defining TEI Types from the API. From the API point of view it's basically a string, an identifier without any other meaning.

Yes, but isn't that a fairly standard way to do it? Just like CycloneDX, each version of TEA would have a specification that the API version would implement. You'd just have to version the API to align with the TEA standard.

If you want to be able to provide a lookup I suggest we create a key-value pair array with a key of "type" and then the "tei" as a value without parsing it further. The query would be ?type=hash&value=sha256:234234234234234234 and similar queries then. That would make the types transparent in the code, but still reachable for queries.

This doesn't really align with modern RESTful API design though and to me is a less clean implementation.

We could go down the whole GraphQL route, but that would require a pretty big overhaul.

oej commented 1 day ago

How would a query based on type and value look into modern API? There are certainly solutions to query for key/value pairs that would work here too.

oej commented 1 day ago

We can't release a new ECMA TEA version for each type that we add. That's not doable. We really need to try to decouple TEI types from API. PURL is struggling with this as well and have decoupled PURL Core from PURL types for the same reason, not having to upgrade PURL core for very new type invented.