datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
168 stars 84 forks source link

Foreign key to a union #324

Open dimitri-yatsenko opened 7 years ago

dimitri-yatsenko commented 7 years ago

In many applications, a need arises to converge two pipelines into one. A desired feature would be to have a foreign key into the union of two tables. For example, imagine that two schemas s1 and s2 have the table Scan. Both s1.Scan and s2.Scan have the same primary key structure but the tables contain non-overlapping sets of primary key values.

Suppose we need to compute some statistic of the scan in the table ScanStat for the data in both s1.Scan and s2.Scan.

We want to be able to declare

@schema
class ScanStat(dj.Computed):
    definition = """  # Scan statistic
    -> s1.Scan + s2.Scan
    ----
    scan_stat  : double
    """
dimitri-yatsenko commented 6 years ago

The way this will be implemented is as follows.

All referenced tables must have the same primary key, which can be achieved by renaming during the reference.

Any time when a union foreign key is used in the primary key, an auxiliary table is created for every reference. In queries, the resulting tables are presented as a union. This may have multiplicative effect. For example, the table declared as

-> reso.Scan + meso.Scan
-> reso.Method + meso.Method

will result in the creation of four tables.

Therefore, perhaps we should allow for the following conventions:

-> reso.Scan * reso.Method

will mean the same thing as

-> reso.Scan 
-> reso.Method

Then

-> reso.Method * reso.Scan + meso.Method * meso.Scan

will result in the creation of only two auxiliary tables.

Then the main table is declared with the same primary key and is used as the shared primary key (See #325) for all the auxiliary tables. Then insertion into the table and deletion from the table should follow the same rules as for tables with a shared primary key in #325 except they are not visible.

stephenholtz commented 4 years ago

Adding this note to bump, after writing a new analysis pipeline I think the fk to union would significantly reduce the complexity in particular for exploratory analysis.