The power of polymath is not interacting with one host but rather reaching out to multiple. I imagine people will take great pride in curating an interesting set of different polymath endpoints in their collection. Sometimes they'll want to ask their client to reach out to just one endpoint in their collection using its shortname, sometimes they'll want all of a group of endpoints ("web_dev" vs "philosophy") and sometimes they'll want to try them all.
Currently directory.SECRET.json is a simple little file of shortname to endpoints. We should allow it to nest groups so people can easily select all endpoints in a given group for a given query and organize it in ways they want.
But I also imagine that people will want to share their curated list and let other people use and build off of them, too. That implies not just nesting of endpoints but also the ability to transclude remote endpoint collections from others into your own.
That is, if a polymath endpoint is an RSS feed, we need an equivalent of OPML (but ideally not quite as janky).
In this issue I'm going to sketch out a rough structure for what the format of directory JSON could be, just to doodle on what it could look like. I'll use TypeScript types just to sketch out the schema and how it nests.
The idea is that a given polymath user would maintain their own directory.SECRET.json in a private place (or administered by a service on their behalf). But they could also host a world-accessible version of their directory for others to use, too.
The root type of each directory JSON blob is RootDirectory.
//An ItemName must be [a-zA-Z0-9-_]*
type ItemName = string;
//A DottedItemName is like ItemName(.ItemName)*
type DottedItemName = string;
type URL = string;
//Timestamp is in whatever reasonable format is idiomatic for JSON
type Timestamp = string
type Directory = {
title?: string,
description?: string,
items: {
[name : ItemName]: Endpoint | Directory | RemoteDirectory | ExpandedRemoteDirectory
}
}
type DirectoryRoot = Directory & {
version: int
//True for which sub listings are enabled by default if the user doesn't specify which endpoints or groups to use. If unset, it will absorb the defaults of sub-directories it transcludes.
enabled: {
[path : DottedItemName]: boolean
}
}
type Endpoint = {
title?: string,
description?: string,
endpoint: URL,
//If the client is asked to fetch dev endpoints (these should be filtered out of remote fetches)
dev_endpoint?: EndpointURL,
//The token to pass when fetching from this endpoint
access_token? : AccessToken
}
type RemoteDirectory = {
href: URL
}
type LastFetched = {
last_fetched: Timestamp
}
type ExpandedRemoteDirectory = ExpandedRemoteDirectorySuccess | ExpandedRemoteDirectoryFailure;
type ExpandedRemoteDirectorySuccess = RemoteDirectory & LastFetched & Directory;
type ExpandedRemoteDirectoryFailure = RemoteDirectory & LastFetched & {
error: string
}
When a directory.json is loaded, do the following steps.
Recurse through items. Any time a RemoteDirectory is discovered (has an href but no items), do the following steps:
1) Verify the fetch has not already happened in this expansion; if it has, do not fetch it (to avoid cycles)
2) Fetch the URL denoted by href
3) If the fetch errors, replace the RemoteDirectory in the output with an error message and set the last_fetched to the current time, then move on.
4) If the fetch succeeds, validate that the version matches, and if not upgrade the content to the current version.
5) In the fetched json, remove the version top level key.
6) Go through all of the fetched content and remove any access_token or dev_endpoint keys that are found (they should not have been included)
7) If the fetched json has a dict of enabled, for each item, go up to the root of the whole file and check its enabled property. Construct a name for each subitem by concatenating the path pieces to reach from the root to this part in the sub-directory, and then append the rest of the key in the subdirectory's enabled. If that key already exists and is set to true or false, leave it. If the key is not set, then set it to whatever value was explicitly set in the fetched sub-directory.
8) Add a last_fetched timestamp set to the current time.
9) Set the RemoteDirectory to this newly processed json blob.
10) Recurse down into any sub-items that are a RemoteDirectory and expand them, too. Don't go past a certain configured ply expansions, set by default to 2.
When a directory.json is prepared to be hosted in a publicly accessible location, it should have any access_token and dev_endpoint removed.
When a given directory.json is in use and the user hasn't explicitly described which endpoints to use (e.g. a list of dotted names that include potentially * wildcards), default to using the paths that are set to true in the root enabled dict)
Note that although it's possible to have arbitrary nesting of RemoteDirectories, in practice that will get slow to load because it requires serial fetches. That's why the expansion algorithm only goes down 2 ply by default. If you're hosting a directory.json that includes remote sub-directories, it's a best practice to at deploy time fetch the current state of the expansion and transclude them, with a timestamp, and then store that fully-expanded one to serve up. That way other clients won't have to refetch, but can note if the last_fetched is old and decide to refetch sub-items if they want.
Things to think through:
1) Should endpoints have favicons or something similar?
2) do we want a concept of tags for more grouping? (How would you merge tag namespaces across different directories that are transcluded into each other?)
3) Do we want some notion of owner for each endpoint and directory? How would they be described? An email address? but how would things in the ecosystem know that the email was actually tied to that endpoint? What if people didn't want to share their email address?
4) Do we need a way for there to be private sub-directories transcluded? Some kind of access_token that is passed when the href is GET'd?
5) Should RemoteDirectory be able to use relative paths, or local filesystem paths (how should that be limited for security?)
The power of polymath is not interacting with one host but rather reaching out to multiple. I imagine people will take great pride in curating an interesting set of different polymath endpoints in their collection. Sometimes they'll want to ask their client to reach out to just one endpoint in their collection using its shortname, sometimes they'll want all of a group of endpoints ("web_dev" vs "philosophy") and sometimes they'll want to try them all.
Currently
directory.SECRET.json
is a simple little file of shortname to endpoints. We should allow it to nest groups so people can easily select all endpoints in a given group for a given query and organize it in ways they want.But I also imagine that people will want to share their curated list and let other people use and build off of them, too. That implies not just nesting of endpoints but also the ability to transclude remote endpoint collections from others into your own.
That is, if a polymath endpoint is an RSS feed, we need an equivalent of OPML (but ideally not quite as janky).
In this issue I'm going to sketch out a rough structure for what the format of directory JSON could be, just to doodle on what it could look like. I'll use TypeScript types just to sketch out the schema and how it nests.
The idea is that a given polymath user would maintain their own directory.SECRET.json in a private place (or administered by a service on their behalf). But they could also host a world-accessible version of their directory for others to use, too.
The root type of each directory JSON blob is
RootDirectory
.When a directory.json is loaded, do the following steps.
Recurse through items. Any time a RemoteDirectory is discovered (has an href but no items), do the following steps: 1) Verify the fetch has not already happened in this expansion; if it has, do not fetch it (to avoid cycles) 2) Fetch the URL denoted by href 3) If the fetch errors, replace the RemoteDirectory in the output with an error message and set the last_fetched to the current time, then move on. 4) If the fetch succeeds, validate that the version matches, and if not upgrade the content to the current version. 5) In the fetched json, remove the
version
top level key. 6) Go through all of the fetched content and remove anyaccess_token
ordev_endpoint
keys that are found (they should not have been included) 7) If the fetched json has a dict ofenabled
, for each item, go up to the root of the whole file and check itsenabled
property. Construct a name for each subitem by concatenating the path pieces to reach from the root to this part in the sub-directory, and then append the rest of the key in the subdirectory'senabled
. If that key already exists and is set totrue
orfalse
, leave it. If the key is not set, then set it to whatever value was explicitly set in the fetched sub-directory. 8) Add alast_fetched
timestamp set to the current time. 9) Set the RemoteDirectory to this newly processed json blob. 10) Recurse down into any sub-items that are a RemoteDirectory and expand them, too. Don't go past a certain configured ply expansions, set by default to 2.When a directory.json is prepared to be hosted in a publicly accessible location, it should have any
access_token
anddev_endpoint
removed.When a given directory.json is in use and the user hasn't explicitly described which endpoints to use (e.g. a list of dotted names that include potentially
*
wildcards), default to using the paths that are set to true in the rootenabled
dict)Note that although it's possible to have arbitrary nesting of RemoteDirectories, in practice that will get slow to load because it requires serial fetches. That's why the expansion algorithm only goes down 2 ply by default. If you're hosting a directory.json that includes remote sub-directories, it's a best practice to at deploy time fetch the current state of the expansion and transclude them, with a timestamp, and then store that fully-expanded one to serve up. That way other clients won't have to refetch, but can note if the last_fetched is old and decide to refetch sub-items if they want.
Things to think through: 1) Should endpoints have favicons or something similar? 2) do we want a concept of
tags
for more grouping? (How would you merge tag namespaces across different directories that are transcluded into each other?) 3) Do we want some notion ofowner
for each endpoint and directory? How would they be described? An email address? but how would things in the ecosystem know that the email was actually tied to that endpoint? What if people didn't want to share their email address? 4) Do we need a way for there to be private sub-directories transcluded? Some kind of access_token that is passed when the href is GET'd? 5) Should RemoteDirectory be able to use relative paths, or local filesystem paths (how should that be limited for security?)