crate / sqlalchemy-cratedb

SQLAlchemy dialect for CrateDB.
https://cratedb.com/docs/sqlalchemy-cratedb/
Apache License 2.0
3 stars 2 forks source link

SQLAlchemy: Polyfill for transparently synchronizing data with `REFRESH TABLE` #83

Open amotl opened 1 year ago

amotl commented 1 year ago

About

Because CrateDB does not immediately flush data to disk, applications relying on that behavior will fail. This becomes immediately appearant when running the test suites of typical SQLAlchemy applications.

Recently, we started working on unlocking MLflow and LangChain, and needed to patch SQLAlchemy, adding a bit of compensation to satisfy their test cases.

Proposal

Provide corresponding functionality through a dialect parameter like crate_refresh_after_dml or crate_synchronize_all, or find a different solution to the same problem.

def polyfill_refresh_after_dml(base_model):
    """
    Run `REFRESH TABLE <tablename>` after each INSERT, UPDATE, and DELETE operation.

    CrateDB is eventually consistent, i.e. write operations are not flushed to
    disk immediately, so readers may see stale data. In a traditional OLTP-like
    application, this is not applicable.

    This SQLAlchemy extension makes sure that data is synchronized after each
    operation manipulating data.

    TODO: Submit patch to `crate-python`, to be enabled by a
          dialect parameter `crate_dml_refresh` or such.
    """
    for mapper in base_model.registry.mappers:
        listen(mapper.class_, "after_insert", do_refresh)
        listen(mapper.class_, "after_update", do_refresh)
        listen(mapper.class_, "after_delete", do_refresh)

References

amotl commented 3 months ago

There is a patch now, which includes the corresponding improvement.