Miksus / rocketry

Modern scheduling library for Python
https://rocketry.readthedocs.io
MIT License
3.23k stars 105 forks source link

SQL repository raises error on async task #171

Open everthought opened 1 year ago

everthought commented 1 year ago

I was trying to set up a SQL repo for logging task status. But after setting it up as described in the documentation, the task execution failed. -> https://rocketry.readthedocs.io/en/stable/cookbook/robust_applications.html#log-to-database I think there must be a bug with the model or something.

If i used the MemoryRepo it worked as expected. After some research i found the doku with setup. https://rocketry.readthedocs.io/en/stable/cookbook/settings.html

Here the SQL repo worked as expekted.

The Code

repo = SQLRepo(model=RunRecord, table="log", engine=db_engine, id_field="id")

app_scheduler = Rocketry(
    # logger_repo=MemoryRepo(),
    logger_repo=repo,
    execution="async",
    config = {
    "task_execution": "async",
    }
    )
everthought commented 1 year ago

After a lot of tinkering i found the issue. The Documentation is misleading on this point.

I found out that there ist the open issue with the missing method app.session.set_repo().

For use with suggested Table

engine.execute("""CREATE TABLE log (
    id INTEGER PRIMARY KEY,
    created FLOAT,
    task_name TEXT,
    run_id TEXT,
    action TEXT
)""")

The Model MinimalRunRecord shoud be used.

For setting up the Scheduler with the SQL Repo i used the following example.


app_scheduler = Rocketry(
    execution="async",
    # logger_repo=MemoryRepo(),
    logger_repo=SQLRepo(engine=db_engine, table="scheduler_log", model=MinimalRunRecord, id_field="id"),
    config = {
    "task_execution": "async",
    'silence_task_prerun': True,
    'silence_task_logging': True,
    'silence_cond_check': True,
    "force_status_from_logs": True
    }
    )
Miksus commented 1 year ago

Could you link the part of the docs you are looking at? I'm actually just finalizing quite a major update on the docs (rewrote the tutorials, added new parts to cookbook and handbook) and I could check that.

I'm leaning on this could be the most standard way of setting the repo:

@app.setup()
def set_repo(logger=TaskLogger()):
    repo = SQLRepo(engine=create_engine("sqlite:///app.db"), table="tasks", model=MinimalRecord, id_field="created")
    logger.set_repo(repo)

You can also use the logger_repo just fine if you prefer that but this has the benefit of that you could set the configs programmatically like:

@app.setup()
def set_config(config=Config(), env=EnvArg("ENV", default="dev")):
    if env == "prod":
        config.silence_task_prerun = True
        config.silence_task_logging = True
        config.silence_cond_check = True
    else:
        config.silence_task_prerun = False
        config.silence_task_logging = False
        config.silence_cond_check = False

I probably will add the set_repo as a convenient method to session as well but at the moment it is found from the task logger which can be acquired by the argument system (see the previous example).

The SQLRepo is quite problematic in terms of implementation as it relies on reflecting SQLAlchemy's ORM models which are sometimes quite a mess. I'm planning of using the SQL expressions at some point so that at least the primary key requirement wouldn't be there.