Identity and uniqueness

alex-dixon commented 7 years ago

We are having to create excessive schema entries for attributes and marking them as :unique/identity in order to have them be one-to-one. Part of this stems from my misunderstanding of :unique/identity. :)

There's a larger issue, which is that sometimes we want attributes to serve as identifiers. For example, we will probably never care about making the mouse an entity. We end up using :mouse/x :mouse/y as the id and grabbing the values (never the eid). How many cases like these there will be, I'm not sure.

Here's my attempt to define things clearly:

if there can only be one occurrence of an attribute for an entity ("a todo has one title"), that is a one to one relationship.
if there can be multiple occurrences of the same attribute for an entity ("a todo list has many todos"), that is a one to many relationship
if comparing across entities there is an attribute for which a value uniquely identifies that entity and that excludes other entities from possessing that same attribute/value pairing (":username foobar"), that relationship is unique identity (because of its uniqueness, it can be used to identify an entity) This means that if todo/title is unique-identity and I want to find a todo/title with the value "Hey", I can safely expect to only receive one match.
if comparing across entities there is an attribute whose value may not be overwritten (upserted), that attribute is a unique value attribute. The only difference between unique value and unique identity attribute is the insertion behavior.

From Datomic's documentation:

:db.unique/value - only one entity can have a given value for this attribute. Attempts to assert a duplicate value for the same attribute for a different entity id will fail. More documentation on unique values is available here. :db.unique/identity - only one entity can have a given value for this attribute and "upsert" is enabled; attempts to insert a duplicate value for a temporary entity id will cause all attributes associated with that temporary id to be merged with the entity already in the database. More documentation on unique identities is available here.

What we appear to want and lack is a designation of a unique attribute -- an attribute that always resolves to a single entity. I don't think using Datomic's schema is the right way to go about it. Instead, I would propose we define a function that allows us to specify a vector of attributes that should be maintained according to "unique attribute". A function could take this vector and generate the needed rules to find facts two facts with that attribute and remove the older one. Performance concerns abound with this so let me know if you think it's worth trying.

alex-dixon commented 7 years ago

Following up on registering unique attributes via a list: That'd be a pain. We could allow the identification of unique attributes on the fly if we accept the trade off of having no centralized location where unique attributes would be defined. We can still easily persist to Datomic this way if unique-attribute facts are UI only (":mouse/move" ":mouse/down"). They would not need to be part of the schema for us to garbage collect/maintain truth, and we could easily filter them out before persisting by virtue of them not being in the schema.

Proposing we try the following:

(insert! [:unique :mouse/x 42])

[[_ :mouse/x ?v]]

;;...rule to remove older facts with eid :unique (or whatever we decide)...

We could insert :unique in first position for any vector we receive that is length 2 and keyword first if we would rather insert unique attribute facts as [:mouse/x 42]. Feedback welcome.

alex-dixon commented 7 years ago

After discussion, we are marking identification of facts by their attribute alone as an enhancement currently not on the roadmap for release. Here's what we plan to support instead and our reasoning behind it:

Transient facts survive one session. We can think of :mouse/x as an example. Our current understanding of the problem is that we cannot adequately manage and maintain facts of this type.
Actions are currently implemented as transient facts because they are removed at the end of every session.
Nearly everything we'd like to identify by attribute is both transient and the result of an action. Therefore, we find it reasonable to associate any transient fact inserted as the result of an action with the action itself by giving the fact the same eid as the action.

This leaves us with the following todos:

Our store-action macro currently generates unique ids for each fact it inserts into the session. Change so each receives the action id. An important implication here is that store-action becomes usable only for automatic insertion of transient facts.
Transient/attribute-only entities joined to an action should be removed at the end of the session. We currently remove any facts that are of type :action. Change the rule to accumulate all facts with any eid associated with any action and retract them.

However, some questions remain about facts that are in some sense 'global' within UI scope and should survive more than one session, such as :ui/visibility-filter in the todomvc example. Although this fact is inserted via an action, assigning it a unique id will ensure it persists. The problem is that subsequent insertions of this fact will not participate in our schema maintenance, because a new unique id will have been assigned to the fact being inserted, resulting in two facts. In this case we can maintain one value for the fact if we use the same eid, but for such a case we may want or need some kind of "singleton" designation for the client side schema that allows identification of facts by their attribute only.

The current implementation will maintain one to one relationships by default, so if by convention we insert [:global :ui/visibility-filter :all], [:global :ui/visibility-filter :active], no further implementation is necessary.

alex-dixon commented 7 years ago

Update: Transient facts/things formerly called "actions" should be inserted with the eid :transient. Such facts are removed at the end of every session.

This approach may not be well suited to cases in which we want a fact with a proper eid to be removed at the end of every session. No such case has come up yet. The worst case seems consumers would need to implement a rule that performs the same function as the following:

[?e :thing/transient? true]
[?entity <- (acc/all) :from [?e :all]
=>
(retract! ?entity)

CoNarrative / precept

Identity and uniqueness #32