django / channels

Developer-friendly asynchrony for Django
https://channels.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6.11k stars 800 forks source link

Sending to group does not work if called in catch block or subsequently after exception processing #1961

Open MichalKyjovsky opened 1 year ago

MichalKyjovsky commented 1 year ago

An unexpected behaviour occurs when trying to send a message to the group via the Redis channel when called from the celery shared task in the except block or any subsequent code if the exception occurs.

OS: macOS: 13.0.1 (22A400) Browser: Google Chrome (Version 108.0.5359.98 (Official Build) (arm64)) Server: Django Daphne ASGI Server

pip freeze

amqp==5.1.1
asgiref==3.5.2
async-timeout==4.0.2
attrs==22.1.0
autobahn==22.7.1
Automat==22.10.0
autopep8==2.0.0
azure-core==1.26.1
azure-identity==1.12.0
azure-storage-blob==12.14.1
billiard==3.6.4.0
celery==5.2.7
certifi==2022.9.14
cffi==1.15.1
channels==4.0.0
channels-redis==4.0.0
charset-normalizer==2.1.1
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
colorama==0.4.5
constantly==15.1.0
cryptography==38.0.1
daphne==4.0.0
Deprecated==1.2.13
Django==4.0.6
django-cors-headers==3.13.0
django-filter==22.1
django-grip==3.2.0
django-rest-passwordreset==1.3.0
django-storages==1.13.1
django-storages-azure==1.6.8
djangorestframework==3.13.1
djangorestframework-simplejwt==5.2.0
et-xmlfile==1.1.0
fillpdf==0.7.2
gripcontrol==4.1.0
gunicorn==20.1.0
hyperlink==21.0.0
idna==3.4
incremental==22.10.0
install==1.3.5
isodate==0.6.1
kombu==5.2.4
lxml==4.9.1
Markdown==3.4.1
MarkupSafe==2.1.1
msal==1.18.0
msal-extensions==1.0.0
msgpack==1.0.4
msrest==0.7.1
ntlm-auth==1.5.0
numpy==1.23.5
oauthlib==3.2.2
Office365-REST-Python-Client==2.3.14
openpyxl==3.0.10
packaging==21.3
pandas==1.5.2
pdf2image==1.16.0
pdfrw2==0.5.0
Pillow==9.3.0
portalocker==2.6.0
prompt-toolkit==3.0.31
psycopg2-binary==2.9.5
pubcontrol==3.3.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.10.0
pycparser==2.21
PyJWT==2.4.0
PyMuPDF==1.21.0
pyOpenSSL==22.1.0
pyparsing==3.0.9
PyPDF2==2.11.2
python-dateutil==2.8.2
pytz==2022.1
redis==4.3.4
requests==2.28.1
requests-ntlm==1.1.0
requests-oauthlib==1.3.1
requests-toolbelt==0.9.1
service-identity==21.1.0
SharePlum==0.5.1
six==1.16.0
sqlparse==0.4.2
tomli==2.0.1
tqdm==4.64.1
Twisted==22.10.0
txaio==22.2.1
typing_extensions==4.4.0
tzdata==2022.1
urllib3==1.26.12
vine==5.0.0
wcwidth==0.2.5
Werkzeug==2.2.2
whitenoise==6.2.0
wrapt==1.14.1
zope.interface==5.5.2

I have the following celery shared task:


@shared_task(bind=True)
def upload_to_sharepoint_task(self, user_id: str, document_type: str, upload_file: str, filename: str) -> None:
    """Uploads a file to SharePoint."""
    task_id = self.request.id

    try:
        upload_to_sharepoint(user_id, document_type, upload_file, filename)

        print(f"Task {task_id} completed successfully.")

        async_to_sync(get_channel_layer().group_send)(task_id, {
            "type": "send_task_status",
            "message": {'task_id': task_id, 'status': status, "error": error}
        })
   except Exception as e:
       print(f"Task {task_id} failed to execute. Posting message to the group.")
       async_to_sync(get_channel_layer().group_send)(task_id, {
            "type": "send_task_status",
            "message": {'task_id': task_id, 'status': status, "error": error}
       })
       print("Message send to the group.")

Expected Behaviour When the exception is thrown, an error message is sent via channels to the subscribed group, and the client is notified.

Actual Behaviour The problem is that when an exception is thrown, then both print messages are called; however, no message is sent to the group, and therefore client gets no notification. When no exception is thrown, then everything works as expected.

Logs No logs are provided as the system acts like everything is alright.

Please also try and include, if you can:

Channels Configuration Daphne and build the server of the React client. An issue occurred in the development environment.

settings.py :

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.core.RedisChannelLayer",
        "CONFIG": {
            "hosts": [("127.0.0.1", 6379)],
        },
    },
}

asgi.py

import os

from channels.auth import AuthMiddlewareStack
from channels.routing import ProtocolTypeRouter, URLRouter
from django.core.asgi import get_asgi_application
from channels.security.websocket import AllowedHostsOriginValidator

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'octo_manager.settings')

# Initialize Django ASGI application early to ensure the AppRegistry
# is populated before importing code that may import ORM models.
django_asgi_app = get_asgi_application()

import api.routing

application = ProtocolTypeRouter({
    "http": django_asgi_app,
    "websocket": AllowedHostsOriginValidator(
            AuthMiddlewareStack(URLRouter(api.routing.websocket_urlpatterns))
        ),
})
carltongibson commented 1 year ago

Can you provided a minimal running reproduce please? I suspect there's a task (consumer?) being cancelled, but it's not possible to say without code.

Note: minimal — please take the time to reduce it.

Thanks.

MichalKyjovsky commented 1 year ago

@carltongibson Is it necessary? It's not really about the SharePoint library that I am using; rather than that, if I trigger any exception and want to send a message about it, it just does nothing. Furthermore, the WebSocket from a client (fronted) remains open.

To reproduce the error, you need to trigger the error inside a Celery and in the except block or anywhere after the try-except statement and send a message to the group.

Anyway, suppose you insist on providing some running examples. What all sources do you need since we talk about the regular Django application and the minimal running sample does not bring any benefits?

carltongibson commented 1 year ago

@MichalKyjovsky A runnable reproduce is necessary yes. You're saying there's a bug. Without a reproduce that's hard to verify.

I'm afraid I don't have the time it would take to reconstruct the reproduce for your problem from the information you've given. You're seeing it: it's much easier for you to create a minimal sample project.

It's not really about the SharePoint library that I am using...

Exactly. So a minimal example excludes all that is not relevant. That way it's possible to see what the error is in Channels, if there is one, or in your project, which you may discover whilst working on the reproduce.