LinkRest Pointers - Formalization

atomashpolskiy commented 8 years ago

@andrus, please review and comment.

What is a pointer

A pointer is a means of specifying an arbitrary node (or a list of nodes) - a target - in the graph of data objects, that exist in LR application. The target is specified in relation to some other node, that acts as the pointer's root. Thus, a pointer can be seen as a path between two nodes in the object graph, where each intermediary step is an instance access or a relationship traversal and the last step is access to some entity's instance, collection of instances or attribute. Formally, such step is called a path segment.

A pointer always has an entity for which it can be resolved. This entity is called the pointer's base type. Usually, the pointer's path makes sense only relative to a specific entity; but it is not in any way restricted to it. Thus, the path itself is not a pointer, as it can be applied for different base types, at least in theory.

Each path segment also has an implicit context - an instance of some entity - which is determined at runtime by evaluating the previous segment. Segment's context is equal to the pointer's base type, if the segment in question is the first one in path.

A pointer always has a target type, which is determined by the pointer's last segment. Resolving the pointer yields an object, a list of objects or a value of this type (in case the target denoted by the pointer's path exists). Pointer's target type is particularly useful when a value yielded by some pointer is used to update the target of some other pointer.

Pointer's path may optionally start with or contain an ID expression. This can be seen as a means to narrow down the context from a collection of entity's instances to a particular instance or to change it from one instance to another. Aside from the practical usefulness of such technique, the pointer's implementation may perform certain optimizations based on this information, so it's reasonable to always begin the pointer's path with an ID expression.

To sum up, a pointer is a combination of some path and some entity for which this path can be resolved. Resolving a pointer yields either an instance of some (possibly the same) entity (or a list of entities) or a value, if the pointer's target is an attribute. If the pointer's last segment can't be resolved, the pointer does not yield a result (i.e. returns null or empty collection). If the pointer's path consists of more than one segment, and some of intermediary segments can't be resolved, then resolution of the pointer will fail with an exception on the first such segment.

Subject to additional discussion: Resource permissions (if any exist) are not taken into account when resolving the pointer's path. Thus, it is possible to access literally every one and each entity instance and all of their attributes via pointers.

Syntax

<pointer> ::= "." | <segment-list>
<segment-list> ::= <segment> ["." <segment-list>]
<segment> ::= <id-expr> | <attr-expr> | <rel-expr>
<attr-expr> ::= <property-name>
<rel-expr> ::= <property-name> [":" <id-expr>]

Expressions

The following characters are reserved and must be escaped in property names and ID expressions:

segment separator "."
ID separator ":"

Escaping is done by doubling the reserved character:

"." -> ".."
":" -> "::"

1. ID expression String representation of the object's ID.

Subject to additional discussion: Compound IDs are not supported yet.

2. Attribute expression Property name, where property is not a relationship.

3. Relationship expression Property name, where property is a relationship. Optionally followed by the ID separator character and an ID expression.

Pointer types

Pointer's base type (entity type) is specified upon creating a pointer.

1. Entity collection Path is equal to ".".

2. Entity instance Path's last segment is an ID expression.

3. Attribute Path's last segment is an attribute expression.

4. To-one relationship Path's last segment is a relationship expression, and the relationship is to-one. Can be implicit (ID not specified) or explicit (ID specified). This does not affect the functional behavior of the pointer.

5. To-many relationship (collection) Path's last segment is a relationship expression, the relationship is to-many, and ID is not specified.

6. To-many relationship (instance) Path's last segment is a relationship expression, the relationship is to-many, and ID is specified.

Pointer operations

1. put Parameters: value Performs one of the following:

Update object's properties in case of an entity instance pointer (merge new properties values into an existing object and remove those that are not specified in the new object).
Update property value, if the property is an attribute.
Add new object in an entity collection or a to-many relationship collection of objects.
Replace an object, that the pointer points to in a to-one relationship or a to-many relationship collection of objects, with a new object. Old object (if exists) is removed from the relationship. If the object or value denoted by the pointer's path does not exist, then a new object or value is placed by this path. In case of an entity collection or a to-many relationship collection of objects, the collection will hold a single object after performing this operation.

Subject to additional discussion:

If the object or value denoted by the pointer's path is an entity collection, and the new value is an empty collection, then all objects in the entity collection are deleted.

If the object or value denoted by the pointer's path is a to-many relationship collection of objects, and the new value is an empty collection, then all objects in the to-many relationship are removed from the relationship.

If the pointer points to an existing object or value, then this object or value is replaced.

2. remove Performs one of the following:

Delete an object in case of an entity instance pointer.
Remove property value, if the property is an attribute.
Remove an object from a to-one relationship or a to-many relationship with a specified ID. The object itself is not deleted.

Subject to additional discussion: when the target is an entity collection or a to-many relationship collection of objects, there are two possibilities: delete/unrelate all objects respectively or fail with an exception. Probably the latter is preferable for now (?), but the former is needed in order to comply with RFC6902.

If the object or value denoted by the pointer's path does not exist or it is an empty collection, then this operation does not have an effect.

3. resolve Resolves object, collection or value denoted by the pointer's path. If the object, collection or value does not exist, then this operation does not yield a result (returns null). If the path can't be resolved (i.e. some of the intermediary segments do not yield a result), then this operation fails with an exception.

Addendum A. Compliance with RFC6902: JavaScript Object Notation (JSON) Patch

All operations from the RFC can be modeled using pointer operations: 1. add Identical to pointer's put. 2. remove Can be modeled by preceding pointer's remove with resolve and null check. 3. replace Can be modeled by preceding pointer's put with resolve and null check. Also see the note for the pointer's remove operation. 4. move (from, to) Can be modeled by "from" pointer's resolve followed by a null check on the result and remove and "to" pointer's remove and put. 5. copy (from, to) Same as for move, but without calling "from" pointer's remove. 6. test Can be modeled by pointer's resolve and equality check. Order of elements in a collection does not matter.

atomashpolskiy commented 8 years ago

Made a few small updates to the text. I personally think that the most interesting and complicated part here is working with collections (entity "collection" and to-many relationships). So comments on this particular matter will be very much appreciated!

atomashpolskiy commented 8 years ago

Future reminder:

Pointers may be extended to target some property of a collection of entities, i.e. a collection of values. Some cool batch operations can then be performed, like:

update all Messages for a particular User to be marked as read
update passwords for all Users in the application to be marked as expired
move a collection of pending orders of some Customer to his collection of fulfilled orders

And so on.

andrus commented 8 years ago

I may have missed some finer points, but here are my comments so far:

I like the possibilities you describe for the collection operations

Resource permissions (if any exist) are not taken into account when resolving the pointer's path.

Good point. We will need to address this eventually.

in order to comply with RFC6902.

I'd say at this point we no longer care about compliance with RFC6902.

Remove an object from a to-one relationship or a to-many relationship with a specified ID. The object itself is not deleted.

How do we handle a case when we want to both delete and unrelate an object resolved for a given pointer?

If the object or value denoted by the pointer's path is an entity collection, and the new value is an empty collection, then all objects in the entity collection are deleted. If the object or value denoted by the pointer's path is a to-many relationship collection of objects, and the new value is an empty collection, then all objects in the to-many relationship are removed from the relationship.

+1

atomashpolskiy commented 8 years ago

I'd say at this point we no longer care about compliance with RFC6902.

Yeah, I understand. Still think that we should try not to diverge too much if possible.

How do we handle a case when we want to both delete and unrelate an object resolved for a given pointer?

Currently there's no way to accomplish that. Would you prefer to have two different operations?

atomashpolskiy commented 8 years ago

In fact in order to implement unrelate+delete we just need to resolve the object instead of calling pointer's remove and then delete the object. But this can fail due to foreign keys, so the safest way would be to call remove first. Still this won't help when there are multiple parents. For that we need some analogue of LR unrelate.

atomashpolskiy commented 8 years ago

@andrus, I've started to work on update/delete pointer operations (https://github.com/atomashpolskiy/link-rest/tree/pointer-operations). I'd be grateful if you helped me with the followings problems:

I'm getting an exception in com.nhl.link.rest.runtime.parser.pointer.PointerUpdateTest#testUpdate_EntityCollectionPointer. Can't figure out if I'm doing something wrong; I create an object with ObjectContext.newObject() and the same context won't then save this object and instead return me an error: ERROR 23505: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'SQL160103153618000' defined on 'E4'. Looks like it doesn't replace temporary ID before flushing the object to database, despite that the ID is marked as generated in Cayenne data map.
I noticed that it's not possible to add an object from different Cayenne context into a to-many relationship. However with to-one relationships this works perfectly fine. Just curious if there's some catch that I should know about?
Do I understand it correctly that Cayenne's SelectById already knows how to compare values of different types? (e.g. string and integer ID values seem to work interchangeably)

Also we need to determine how the 'entity collection' pointer (specified as "." in short form) will look like in JSON? Empty object maybe?

andrus commented 8 years ago

It appears the exception is caused by mixing explicit and auto-generated IDs in Derby. Replacing explicit IDs with auto generation fixes it:

diff --git a/src/test/java/com/nhl/link/rest/runtime/parser/pointer/PointerUpdateTest.java b/src/test/java/com/nhl/link/rest/runtime/parser/pointer/PointerUpdateTest.java
index 2e92854..96f392b 100644
--- a/src/test/java/com/nhl/link/rest/runtime/parser/pointer/PointerUpdateTest.java
+++ b/src/test/java/com/nhl/link/rest/runtime/parser/pointer/PointerUpdateTest.java
@@ -61,9 +61,9 @@ public class PointerUpdateTest extends JerseyTestOnDerby {
     public void testUpdate_EntityCollectionPointer() throws Exception {

         SQLTemplate insertE4_1 = new SQLTemplate(E4.class,
-                               "INSERT INTO utest.e4 (id, c_varchar, c_int) values (1, 'xxx', 5)");
+                               "INSERT INTO utest.e4 (c_varchar, c_int) values ('xxx', 5)");
         SQLTemplate insertE4_2 = new SQLTemplate(E4.class,
-                               "INSERT INTO utest.e4 (id, c_varchar, c_int) values (2, 'yyy', 7)");
+                               "INSERT INTO utest.e4 (c_varchar, c_int) values ('yyy', 7)");
                runtime.newContext().performGenericQuery(insertE4_1);
         runtime.newContext().performGenericQuery(insertE4_2);

atomashpolskiy commented 8 years ago

Thanks a lot!

andrus commented 8 years ago

I noticed that it's not possible to add an object from different Cayenne context into a to-many relationship. However with to-one relationships this works perfectly fine. Just curious if there's some catch that I should know about?

IIRC trying to link up objects from 2 different contexts causes an exception. If one of them is not registered in any context, it will get registered though, and no exception will occur. From what I can tell by looking at the code, this behaves the same with to-one and to-many.

In the new code I noticed we are working with unregistered objects quite a bit. It is not bad by itself, but is somewhat against the flow. Perhaps we can analyze the use cases and see if we can ensure that DataObject instances passed to pointers are coming from a context (I realize that ObjectContext instance is hidden inside pointer context... hmm... something to think about)

Do I understand it correctly that Cayenne's SelectById already knows how to compare values of different types? (e.g. string and integer ID values seem to work interchangeably)

This is probably a feature of the underlying DB (and is likely DB-specific).. Cayenne just passes the ID value through to the JDBC call.

Also we need to determine how the 'entity collection' pointer (specified as "." in short form) will look like in JSON? Empty object maybe?

In what context? Could you give an example?

agrestio / agrest