Closed Ndpnt closed 1 year ago
The GET /services/service1
example is different from the closing GET /services/service1
example at the end. I'm a bit confused about whether all the endpoints really return an array of objects with key type and string value?
Is sourceDocuments
omitted from the other endpoints? Only from the examples? Omitted when empty?
Is there even a need for sub-endpoints when /services/:serviceId
data is (presumably) small enough that it could always serve all terms with type and doc locations? Omitting them could simplify the interface and reduce upcoming maintenance if deemed viable. For consumers it'd mean less structure parsing considerations too.
If deemed useful query parameters could serve for filtering (?termtype=Privacy Policy
- replacing the sub endpoints with different result structure of this proposal.
Hi @Kissaki and thank you for taking the time to participate in this RFC 😃
The GET /services/service1 example is different from the closing GET /services/service1 example at the end. I'm a bit confused about whether all the endpoints really return an array of objects with key type and string value? Is sourceDocuments omitted from the other endpoints? Only from the examples? Omitted when empty?
The examples are different because in the first example I deliberately include only the minimal attributes of the services
and terms
objects, name
for services
and type
for terms
respectively, but we could add more attributes if it seems relevant.
In the last example, I added sourceDocuments
to the terms
object to give an example of the type of terms
attributes that could be added to the response if necessary.
In the end, if we actually only keep type
for terms in the response, it could be:
GET /services/service1
{
"id": "service1",
"name": "Service 1",
"terms": [ "Terms of Service", "Privacy Policy"]
}
Here is an example of a response with all the attributes to know what is available.
GET /services/service1
{
"id": "service1",
"name": "Service 1",
"terms": [
{
"type": "Terms of Service",
"sourceDocuments": [
{
"location": "https://service1.com/tos-1",
"executeClientScripts": false,
"contentSelectors": "#main",
"insignificantContentSelectors": ".returnToTop",
"filters": ["cleanUrls"],
},
{
"location": "https://service1.com/tos-2",
"executeClientScripts": false,
"contentSelectors": "#main",
"insignificantContentSelectors": ".returnToTop",
"filters": ["cleanUrls"],
}
]
},
{
"type": "Privacy Policy",
"sourceDocuments": [
{
"location": "https://service1.com/privacy-policy"
"executeClientScripts": true,
"contentSelectors": "body",
"insignificantContentSelectors": ".returnToTop",
"filters": ["cleanUrls"],
}
]
}
],
"filters": "function cleanUrls(document) {…}"
}
I specify that the deadline is 24/04 end of day AoE (Anywhere on Earth).
Thank you very much @Ndpnt for opening this RFC and for this first clear proposal!
I believe we should allow mounting the route under any arbitrary path, since several collections might be made available on a single host.
- <collection host>/api/:version
+ <collection host>[/optional/path]/api/:version
GET /services
name
constraints are very few. How would non-ASCII characters and URL-meaningful characters be handled?type
over terms_type
as a query term, since in this context it is probably clear enough./services/:serviceId
endpoint for additional information? 🙂GET /services/:serviceId
serviceId
constraints are based on filesystem constraints, not on URI constraints. How would non-ASCII characters and URL-meaningful characters be handled?GET /services/:serviceId/terms/:termsType
| Parameter | Type | Description |
| --------- | ------ | ---------------------- |
| serviceId | string | The ID of the service. |
- | termsType | string | The terms type. |
+ | termsType | URL-encoded string | The terms type. |
Both for GET /services/:serviceId/terms
and for GET /services/:serviceId/terms/:termsType
, I second @Kissaki: I fail to see the added value for consumers, and see how the maintenance, testing and complexity would increase.
Unless we have a very clear way to do “fuzzy search”, I would also not be shocked if the API left it to the consumers to implement search, and only enabled two things:
/services
, for enumerating IDs and names./service/:serviceId
, for getting the full data of a given service.I thus offer an alternative proposition below.
<collection host>[/path]/api/:version
GET /services
Enumerate all services.
GET /service/:serviceId
Retrieve the declaration of a specific service through its ID.
Thank you @MattiSG for your relevant feedback 🙂.
Base URL
I believe we should allow mounting the route under any arbitrary path, since several collections might be made available on a single host.
- <collection host>/api/:version + <collection host>[/optional/path]/api/:version
👍
GET /services
- I understand we would like “fuzzy search”, but this makes it potentially very complicated as we'd need to specify how the “fuzziness” works. How do you view this fuzziness factor? 🙂
I was thinking of dealing with case and accents. For example to allow easily find if services like YouTube
or GitHub
exist in the collection even if in the request the name was not quite right: GET /services?name=Youtube
or GET /services?name=github
.
- Currently, the
name
constraints are very few. How would non-ASCII characters and URL-meaningful characters be handled?
I was thinking of processing them by encoding them.
For example: https://example.com/path/with/é
→ https://example.com/path/with/%C3%A9
- I prefer
type
overterms_type
as a query term, since in this context it is probably clear enough.
👍
- The result could end up being pretty big. What about adding pagination or, in a simpler way, simply returning a list of IDs and names, leaving it to the consumer to query the
/services/:serviceId
endpoint for additional information? 🙂
I'm in favor or of simply returning a list of IDs and names.
GET /services/:serviceId
- Currently, the
serviceId
constraints are based on filesystem constraints, not on URI constraints. How would non-ASCII characters and URL-meaningful characters be handled?
By encoding them.
GET /services/:serviceId/terms/:termsType
| Parameter | Type | Description | | --------- | ------ | ---------------------- | | serviceId | string | The ID of the service. | - | termsType | string | The terms type. | + | termsType | URL-encoded string | The terms type. |
Both for
GET /services/:serviceId/terms
and forGET /services/:serviceId/terms/:termsType
, I second @Kissaki: I fail to see the added value for consumers, and see how the maintenance, testing and complexity would increase.
The idea was to return only the information for a specific terms type. I agree with both of you when only the minimal attributes of the terms are returned, but I think it can be valuable if the full terms object is returned.
In the first stage, I think that the simple proposal B is very good. I'm just a little concerned that external applications might consider a service not declared because of a small case error in the service name.
But I really like the idea of keeping things simple, so I vote for this proposal and we'll see with our collaboration with ToS;DR if we run into problems 🙂
Proposal B sounds good to me
The deadline has expired, thank you all for your participation. The proposal B receiving the most approvals, so this is the one that will be implemented. 🙂
Context and Problem Statement
Open Terms Archive (OTA) is a decentralised system that tracks collections of services and documents across multiple servers. Each collection has its own public repository where services and documents declarations are stored. The decentralisation of OTA presents a challenge when it comes to easily identifying which services and documents are currently being tracked.
This can complicate collaborative efforts with external applications, such as Terms of Service; Didn't Read (ToS;DR), whose web application will be adapted to obtain data from public OTA datasets instead of the ToS;DR server database. When users of the application attempt to add a new document, the system must be able to inform them whether the document already exists in an OTA collection or not and in which one.
To address this problem, we propose the creation of an API that allows easy access to the metadata of each OTA collection and thus facilitate collaboration with external applications.
This RFC outlines the details of the proposed collection metadata API.
Proposed solution: Collection metadata API
Base URL
<collection host>/api/:version
Endpoints
GET /services
Retrieve all services, with optional query parameters
GET /services/:serviceId
Retrieve a specific service by ID
GET /services/:serviceId/terms
Retrieve all terms included in the specified service
GET /services/:serviceId/terms/:termsType
Retrieve a specific terms within a specific service by its type
Note
The proposed API will initially be exposed on each OTA instance, and a publicly available description file will list the federated collections and access points. A federated API that consolidates access to multiple OTA instances will be developed at a later stage.
The proposed API will voluntary only expose the minimum attributes of
services
andterms
for now, but if necessary we could add more information in the response. Let us know if you have a specific need.Here is an example with the location of each source documents that constitute a terms: