fgmacedo / python-statemachine

Python Finite State Machines made easy.
MIT License
854 stars 84 forks source link

Using async wrappers on callbacks causes errors with sync-only callbacks #446

Closed robinharms closed 2 months ago

robinharms commented 3 months ago

Description

The new async system causes exceptions with things that guard against operations within an async wrapper, for instance most DB-operations within Django.

What I Did

I updated the Django example I sent you before to catch the new problem. https://github.com/robinharms/pysm-django

It works as expected when run with release 2.2.0.

The only work-around I can think of is to avoid using coroutines unless async is specified on the callback. What do you think?

All the best, Robin

fgmacedo commented 3 months ago

Hi @robinharms , how are you?

Thanks for your detailed report. I've found in the Django docs the root cause: https://docs.djangoproject.com/en/5.1/topics/async/#async-safety

The approach I took was to go full async in the library with a sync/async adapter on the API and on the handlers. I did not expect this kind of guards.

I'm trying to figure out a workaround for the problem.

Best!

fgmacedo commented 2 months ago

Hi @robinharms , how are you?

Can you please check if the fix in #452 ( branch macedo/compat-django-async-guards) works for your usecase?

I've added a sync_to_async wrap around sync callbacks if asgiref is available, as pointed out in Django's async safety docs.

robinharms commented 2 months ago

Hi, I'm on a conference right now and can't check. (Hopefully tomorrow!) But, from my somewhat shallow understanding, using a wrapper won't work with transactions and similar. They might cause dataloss in that case since everything will be run in a new thread.

But I'll try to get a proper look tomorrow! All the best

robinharms commented 2 months ago

I've had time to test this now with database access, and as I expected it only makes the error more opaque. Starting a new thread with the database locked within another transaction won't solve it.

If you install the example I linked and run your code you'll see it.

I don't see any other way around it than differentiating between sync and async, or use the same convention as Django where there's an alternate async version of everything with an a. (Like aget / get https://docs.djangoproject.com/en/5.0/ref/models/querysets/#get ) That way developers could choose async or sync execution of listeners/callbacks.

All the best, Robin

fgmacedo commented 2 months ago

Yes, thanks for the example. I've ported the code to include in the automated tests on the library itself, so now I can check against any regression: https://github.com/fgmacedo/python-statemachine/tree/develop/tests/django_project related to Django support without the need of external checks.

The problem now is this: https://github.com/django/channels/issues/1110

There's no support for transactions inside async Django ORM.

So the code works for read-only callbacks using the Django ORM.

And also, I saw that you have used SQLite for the toy example. I'm asking because the threading problem is related to a limitation of the SQLite engine running on memory, not to the Django ORM. So to get to the same point on my tests, you need to change your settings to also tell Django to use a file for the test database when using SQLite.

Like this:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": BASE_DIR / "db.sqlite3",
        "TEST": {
            "NAME": "testdb.sqlite3",
        },
    }
}

Assuming that the code only works if there's no read-after-write because of the limitation of transactions on the Django ORM async support, I've changed the test to use a transactional_db fixture, that does not involve a transaction and does not encounter this.

While it's not ideal, it's a step forward. I'm aware of the problem and am actively considering solutions. I really like the idea of having a unified interface, but it's proving more challenging than I initially thought. This might explain why many libraries have different APIs. My goal is to provide transparency for developers so they don't need to learn two different APIs to work with the library.

Best! And thanks again for your time reviewing this.

fgmacedo commented 2 months ago

New idea: https://github.com/fgmacedo/python-statemachine/pull/456

Just a few test cases away to increase coverage, but got 100% sync behavior again if all registered callbacks are sync.

robinharms commented 2 months ago

Nice! It's a much better idea than splitting the callbacks.

Oh and your question about running SQLite in prod: The only cases I know of are sites that use it for almost static content, never for things that might involve simultaneous writes or any kind of user interactions. But I guess the usecase for workflows in those cases exist too. (Like an intranet perhaps?)

All the best, Robin