I'm developing a TREE API (for Triply) and notice performance issues when fetching tree:Nodes for ranges which contain a large number of tree:members. I'm wondering what the best practice is for handling this.
I'm currently using a granularity level on a time predicate (prov:generedAtTime) to determine which entities to return for each tree:Node. For instance, the URL containing 2020-01-01T00:00 would return data for all entities which have a value for prov:generatedAtTime between 2020-01-01T00:00 and 2020-01-01T01:00 if the granularity is set to one hour. This makes the size of each tree:Node data-dependent. Simply making the granularity smaller would not resolve the issue, since it's possible for any number of entities to have the exact same value for prov:generatedAtTime.
Some possible solutions
Use traditional pagination args in tree:Node URLs
Let's say my:api/2020-01-01T00:00 returns 200 entities, and has tree:relation [a tree:GreaterThanOrEqualToRelation; tree:node <my:api/2020-01-01T10:00>]. I could change this to my:api/2020-01-01T00:00 only returning the first e.g. 100 entities, and to link to <my:api/2020-01-01T00:00?page=2> which would return the next 100 entities. <my:api/2020-01-01T00:00?page=2> would link to <my:api/2020-01-01T10:00>, as there is no further next page within the time range.
Use tree:import to separate the navigation and entity data
Instead of returning the entity data together with the navigation data, the entity data could be made available under a different API path, which I would reference with a tree:import statement. For example: <my:api/2020-01-01T00:00> tree:import <my:api/2020-01-01T00:00/entities>. With this solution navigation over the nodes would not be slowed down by fetching the entity data, but performance issues would still occur when a client follows these tree:import links.
Use a tree:import for each tree:member.
A variation of the above approach is for the tree:Node paths to still return IRIs for all entities which belong to that tree:Node, but to put additional data about the entities behind different imports. For example, a certain node would return <my:collection> a tree:Collection; tree:member <r:1>, <r:2>. <r:1> tree:import <my:api/describe/r:1>. <r:2> tree:import <my:api/describe/r:2>. This would still causes issues for tree:Nodes containing very big sets of entities, but in my particular case it's the expanded descriptions of the resources that cause issues.
I'm developing a TREE API (for Triply) and notice performance issues when fetching
tree:Node
s for ranges which contain a large number oftree:member
s. I'm wondering what the best practice is for handling this.I'm currently using a granularity level on a time predicate (
prov:generedAtTime
) to determine which entities to return for eachtree:Node
. For instance, the URL containing2020-01-01T00:00
would return data for all entities which have a value forprov:generatedAtTime
between2020-01-01T00:00
and2020-01-01T01:00
if the granularity is set to one hour. This makes the size of eachtree:Node
data-dependent. Simply making the granularity smaller would not resolve the issue, since it's possible for any number of entities to have the exact same value forprov:generatedAtTime
.Some possible solutions
Use traditional pagination args in
tree:Node
URLsLet's say
my:api/2020-01-01T00:00
returns 200 entities, and hastree:relation [a tree:GreaterThanOrEqualToRelation; tree:node <my:api/2020-01-01T10:00>]
. I could change this tomy:api/2020-01-01T00:00
only returning the first e.g. 100 entities, and to link to<my:api/2020-01-01T00:00?page=2>
which would return the next 100 entities.<my:api/2020-01-01T00:00?page=2>
would link to<my:api/2020-01-01T10:00>
, as there is no further next page within the time range.Use
tree:import
to separate the navigation and entity dataInstead of returning the entity data together with the navigation data, the entity data could be made available under a different API path, which I would reference with a
tree:import
statement. For example:<my:api/2020-01-01T00:00> tree:import <my:api/2020-01-01T00:00/entities>
. With this solution navigation over the nodes would not be slowed down by fetching the entity data, but performance issues would still occur when a client follows thesetree:import
links.Use a
tree:import
for eachtree:member
.A variation of the above approach is for the tree:Node paths to still return IRIs for all entities which belong to that tree:Node, but to put additional data about the entities behind different imports. For example, a certain node would return
<my:collection> a tree:Collection; tree:member <r:1>, <r:2>. <r:1> tree:import <my:api/describe/r:1>. <r:2> tree:import <my:api/describe/r:2>
. This would still causes issues fortree:Node
s containing very big sets of entities, but in my particular case it's the expanded descriptions of the resources that cause issues.