datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
169 stars 84 forks source link

Long make calls lock table metadata #1170

Open ethho opened 3 weeks ago

ethho commented 3 weeks ago

Bug Report

Description

A client locks table metadata for the entire duration of a make function call. When other clients attempt to drop or declare child tables, the call is blocked until the first client finishes make. This approach scales poorly with number of clients and number of child tables.

Reproducibility

Include:

Proposed Solution

As an alternative to writing a Computed.make function, allow user to write three functions:

  1. make_fetch for reading inputs
  2. make_compute, which is not run in a transaction, and is passed the return value of make_fetch
  3. make_insert, which inserts computed values using the same transaction semantics as make.

In pseudocode, these three functions will be used in the following routine as such:

if hasattr(table, "make"):
    return make()
else:
    assert hasattr(table, "make_fetch")
    assert hasattr(table, "make_compute")
    assert hasattr(table, "make_insert")
    input = make_fetch()
    conn.disconnect() # I assume this disconnect step is to ensure that make_compute cannot insert?
    result = make_compute(input)
    tx = conn.start_transaction()
    input2 = make_fetch()
    if hash(serialize(input2)) == hash(serialize(input)):
        result = make_insert(result)
        tx.commit()
        return result
    else:
        print("ERROR: inputs have changed")
        tx.abort()
        return None

Additional Research and Context

Related Issues


cc: @dimitri-yatsenko @ttngu207 @CBroz1 @samuelbray32 @peabody124

dimitri-yatsenko commented 3 weeks ago

This will be inside populate and will follow all the conventions of populate.

Yes, it looks correct. If we want to be fancy, we can prohibit insert calls in make_fetch, insert and fetch calls from make_compute, and fetch operators from make_insert.

dimitri-yatsenko commented 3 weeks ago

@ethho, our blob serialization serializes most types of data into binary strings. You can use a hash on the serialized data for comparing input to input2

horsto commented 2 weeks ago

I am following this. I see the #1171. Can this issue here be updated regularly when this is implemented / in a testable state? Thanks for taking care of this!

dimitri-yatsenko commented 1 week ago

This is a high priority for multiple labs.