datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 84 forks source link

shared primary key #325

Open dimitri-yatsenko opened 7 years ago

dimitri-yatsenko commented 7 years ago

It is often desired to have common identification of entities stored in different tables. For example, in the stimulus schema https://github.com/cajal/pipeline/tree/master/matlab/schemas/%2Bstimulus, the different types of stimuli are stored in different tables. The stimulus.Condition table contains attributes common to all conditions and the specialization tables contain the special attributes of each stimulus type. For each entry in stimulus.Condition, only one of the special tables should have a matching entry. It is desirable to express this in the database design.

The syntax could be as follows:

@schema
class ConditionKey(dj.SharedKey):
    definition  = """  # shared primary key for stimulus condition
    -> Condition
    """

@schema
class Monet(dj.Manual):
    definition = """  # Monet stimulus
    ConditionKey
    ---
    movie :  longblob
    """"

@schema
class Grating(dj.Manual):
    definition = """  # Grating stimulus
    ConditionKey
    ---
    direction :   decimal(4,1)
    """"

An alternative syntax (brainstormed with @eywalker) is:

@schema
class Condition(dj.SharedKey):
    definition  = """  # shared primary key for stimulus condition
    -> Scan
    condition_idx  : int  
    """

    class Monet(dj.Manual):
        definition = """  # Monet stimulus
        movie :  longblob
        """"

    class Grating(dj.Computed):
        definition = """  # Grating stimulus
        direction :   decimal(4,1)
        """"

In this syntax, the shared key class defines the primary key and is not allowed to have non-primary attributes. The nested classes only define the non-primary key attributes.

This solution also solves #324 (foreign keys to unions). We can now declare a foreign key to Condition, which is effectively, a foreign key into [Monet, Grating]

Parts can share a primary key too:

@schema
class ConditionGroup(dj.Manual):
    definition = """
     -> Scan
     ----
     timestamp : timestamp  
     """

    class Condition(dj.SharedKey):
        definition = """
        -> ConditionGroup
        cond_idx  : int
        """

        class Monet(dj.Part):
            definition = """  # Monet stimulus
            movie :  longblob
            """"

        class Grating(dj.Part):
            definition = """  # Grating stimulus
            direction :   decimal(4,1)
            """"
dimitri-yatsenko commented 6 years ago

Perhaps another syntax is to add a new foreign key property (see #301) called shared.

@schema
class Monet(dj.Manual):
    definition = """
    # Monet stimulus
    -> [shared] Condition
    --- 
    movie :  longblob
    """"

@schema
class Grating(dj.Computed):
    definition = """  
    # Grating stimulus
    -> [shared] Condition
    --- 
    direction :   decimal(4,1)
    """"

If two or more tables make a shared foreign key reference to the same table, only one can refer to the same entry at a time.

Incidentally, shared implies unique (See #301).

Perhaps multiple mutually exclusive pools could be distinguished by a hyphenated extension: [shared-1] or [shared-visual]. However, for now, I would just implement a single shared pool and allow extensions if it ever becomes necessary.

dimitri-yatsenko commented 6 years ago

To implement this, for every table that is used as a shared primary key, an invisible table ~share_table_name is created with the same primary key as the original table and referencing it with a foreign key. For examples, for the declarations above, the table ~condition_shared is created with a reference to condition. Then insertion into any of the tables with the shared foreign key will first attempt to insert into the invisible table first. The foreign key will be made to the hidden table. The ERD will also display the hidden node with a special unnamed node.