carrotquest / django-clickhouse

This project's goal is to build Yandex ClickHouse database into Django project.
MIT License
103 stars 26 forks source link

There is no data synchronization when creating records in PostgreSQL. #47

Closed ValentinDevPy closed 1 year ago

ValentinDevPy commented 1 year ago

Hello, I am using your library, and when creating records in a table there is no data synchronization . In this case, the task queue works correctly, but the tasks themselves do not receive arguments. At the same time, requests to clickhouse through the orm work fine. My docker-compose:

version: 3

volumes:
  smartdialogs_pa_local_postgres_data: {}
  smartdialogs_pa_local_postgres_data_backups: {}
  clickhouse-data: {}
  pgadmin-data: {}

services:
  django: &django
    build:
      context: .
      dockerfile: ./compose/local/django/Dockerfile
    image: smartdialogs_pa_local_django
    depends_on:
      - postgres
      - redis
      - mailhog
      - clickhouse
    volumes:
      - .:/app:z
    env_file:
      - ./.envs/.local/.django
      - ./.envs/.local/.postgres
    ports:
      - "8000:8000"
    command: /start
    restart: always

  postgres:
    build:
      context: .
      dockerfile: ./compose/production/postgres/Dockerfile
    image: smartdialogs_pa_production_postgres
    container_name: smartdialogs_pa_local_postgres
    volumes:
      - smartdialogs_pa_local_postgres_data:/var/lib/postgresql/data:Z
      - smartdialogs_pa_local_postgres_data_backups:/backups:z
    ports:
      - "5432:5432"
    env_file:
      - ./.envs/.local/.postgres

  clickhouse:
        image: clickhouse/clickhouse-server:22.9.2.7
        container_name: "clickhouse"
        ports:
          - "8123:8123"
        environment:
          - CLICKHOUSE_DB=testing
          - CLICKHOUSE_USER=default
          - CLICKHOUSE_PASSWORD=default
        volumes:
          - clickhouse-data:/var/lib/clickhouse:z

  redis:
    image: redis:6
    container_name: smartdialogs_pa_local_redis
    ports:
      - "6379:6379"

  celeryworker:
    <<: *django
    image: smartdialogs_pa_local_celeryworker
    container_name: smartdialogs_pa_local_celeryworker
    depends_on:
      - redis
      - postgres
      - mailhog
    ports: []
    command: /start-celeryworker

  celerybeat:
    <<: *django
    image: smartdialogs_pa_local_celerybeat
    container_name: smartdialogs_pa_local_celerybeat
    depends_on:
      - redis
      - postgres
      - mailhog
    ports: []
    command: /start-celerybeat

My settings.py:


# DATABASES
# ------------------------------------------------------------------------------
DATABASES = {'default': env.db('DATABASE_URL')}
DATABASES['default']['ATOMIC_REQUESTS'] = True
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'

# APPS
# ------------------------------------------------------------------------------
DJANGO_APPS = [
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    # "django.contrib.humanize", # Handy template tags
    'django.contrib.admin',
    'django.forms',
]
THIRD_PARTY_APPS = [
    'crispy_forms',
    'crispy_bootstrap5',
    'allauth',
    'allauth.account',
    'allauth.socialaccount',
    'django_celery_beat',
    'django_filters',
    'rest_framework',
    'rest_framework.authtoken',
    'rest_framework_simplejwt',
    'djoser',
    'corsheaders',
    'drf_spectacular',
    'django_clickhouse',
]

LOCAL_APPS = [
    'src.users.apps.UsersConfig',
    'src.core.apps.CoreConfig',
    'src.dashboards.apps.DashboardsConfig',
    # Your stuff: custom apps go here
]
INSTALLED_APPS = DJANGO_APPS + THIRD_PARTY_APPS + LOCAL_APPS

# MIGRATIONS
# ------------------------------------------------------------------------------
MIGRATION_MODULES = {'sites': 'src.contrib.sites.migrations'}

# Celery
# ------------------------------------------------------------------------------
if USE_TZ:
    CELERY_TIMEZONE = TIME_ZONE
CELERY_BROKER_URL = env('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = CELERY_BROKER_URL
CELERY_RESULT_EXTENDED = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
# TODO: set to whatever value is adequate in your circumstances
CELERY_TASK_TIME_LIMIT = 5 * 60
# TODO: set to whatever value is adequate in your circumstances
CELERY_TASK_SOFT_TIME_LIMIT = 360
CELERY_BEAT_SCHEDULER = 'django_celery_beat.schedulers:DatabaseScheduler'

CLICKHOUSE_DATABASES = {
    'default': {
        'db_name': 'testing',
        'username': 'default',
        'password': 'default',
        'db_url': 'http://clickhouse:8123'
    }
}
CLICKHOUSE_REDIS_CONFIG = {
    'host': "redis",
    'port': 6379,
    'db': 8,
    'socket_timeout': 10
}

CLICKHOUSE_CELERY_QUEUE = 'celery'

CELERY_BEAT_SCHEDULE = {
    'clickhouse_auto_sync': {
        'task': 'django_clickhouse.tasks.clickhouse_auto_sync',
        'schedule': timedelta(seconds=60),  # Every 2 seconds
        'options': {'expires': 10, 'queue': CLICKHOUSE_CELERY_QUEUE}
    }
}

My Django-model:

class Test(ClickHouseSyncModel):
    first_name = models.CharField(max_length=50)
    visits = models.IntegerField(default=0)
    birthday = models.DateField()

My Clickhouse-model:

class ClickHouseTest(ClickHouseModel):
    django_model = Test
    sync_batch_size = 1000
    sync_enabled = True

    id = fields.UInt32Field()
    first_name = fields.StringField()
    birthday = fields.DateField()
    visits = fields.UInt32Field(default=0)

    engine = MergeTree('birthday', ('birthday',))

Logs:

smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,259: INFO/SpawnProcess-4] Task django_clickhouse.tasks.clickhouse_auto_sync[adb5f82d-faa1-427f-a2ea-231955b0ddca] received smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,283: DEBUG/ForkPoolWorker-2] django-clickhouse: need_sync returned True for class ClickHouseTest as no last sync found (now: 2023-02-09T18:47:37.282875, last: 2023-02-09T18:46:37.330088, delay: 5) smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,297: DEBUG/ForkPoolWorker-2] django-clickhouse: need_sync returned False for class ClickHouseMultiModel as sync is disabled smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,297: INFO/SpawnProcess-4] Task django_clickhouse.tasks.sync_clickhouse_model[5e61991e-9829-46bc-aa4a-e6fc1d8fea28] received smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,300: INFO/ForkPoolWorker-2] Task django_clickhouse.tasks.clickhouse_auto_sync[adb5f82d-faa1-427f-a2ea-231955b0ddca] succeeded in 0.03706895799996346s: None smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,311: DEBUG/ForkPoolWorker-1] django-clickhouse: acquiring lock "clickhouse_sync:lock:ClickHouseTest" with pid 21 smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,313: DEBUG/ForkPoolWorker-1] django-clickhouse: acquired lock "clickhouse_sync:lock:ClickHouseTest" with pid 21 smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,315: DEBUG/ForkPoolWorker-1] django-clickhouse: got 0 operations from storage (key: ClickHouseTest) smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,315: DEBUG/ForkPoolWorker-1] django-clickhouse: got 0 objects to import from database (key: ClickHouseTest) smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,317: DEBUG/ForkPoolWorker-1] django-clickhouse: releasing lock "clickhouse_sync:lock:ClickHouseTest" with pid 21 smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,318: INFO/ForkPoolWorker-1] django-clickhouse: removed 0 operations from storage (key: ClickHouseTest) smartdialogs_pa_local_celeryworker | [2023-02-09 18:47:37,322: INFO/ForkPoolWorker-1] Task django_clickhouse.tasks.sync_clickhouse_model[5e61991e-9829-46bc-aa4a-e6fc1d8fea28] succeeded in 0.022684499999968466s: None

M1ha-Shvn commented 1 year ago

Hi, Valentin. First of all, you have not provided code, you are using to create records. It's really hard to guess, what you are doing without it. Can you show it? Secondly, I'd like to know what verions of python, django and django-clickhouse you are using.

P. s. according to your logs, sync process is working well, but no operations have been registered for sync for your ClickHouseTest model. I guess, that there was some trouble with creation of Test model records.

ValentinDevPy commented 1 year ago

Hello, I am creating an entry, both manually through the ORM and through the ModelViewSet, the code is below:

Serializer:

    class Meta:
        fields = '__all__'
        model = Test

ViewSet:

class TestViewSet(viewsets.ModelViewSet):
    permission_classes = [AllowAny]
    queryset = Test.objects.all()
    serializer_class = TestSerializer

Manually creating:

Test.objects.create(first_name='123',visits=1,birthday='2022-12-12')

Python version is 3.10.8 Django==4.1.3 Django-clickhouse==1.2.1 DRF==3.14.0

M1ha-Shvn commented 1 year ago

Modified GitHub Actions tests so they check latest software versions https://github.com/carrotquest/django-clickhouse/pull/48

M1ha-Shvn commented 1 year ago

Looks like there is incompatibility with django 4.0+. I'll have a look

ValentinDevPy commented 1 year ago

Looks like there is incompatibility with django 4.0+. I'll have a look

I also tried to specify Django v3 and downgraded the respective libraries, it did not help.

M1ha-Shvn commented 1 year ago

Failing in django 4.0+ tests are related to django-pg-returning library. I'll create separate compatibility issue there.

M1ha-Shvn commented 1 year ago

Looks like there is incompatibility with django 4.0+. I'll have a look

I also tried to specify Django v3 and downgraded the respective libraries, it did not help.

It is strange. It looks like registering operations are not called when you call create method. Haven't you redeclared create method inside Manager? Can you show me results of:

from django_clickhouse.models import *

print('Model is instance of ClickHouseSyncModel', isinstance(Test, ClickHouseSyncModel), Test.__mro__)
print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)
print('QuerySet is instance of ClickHouseSyncQuerySetMixin', isinstance(Test.objects, ClickHouseSyncQuerySetMixin))
ValentinDevPy commented 1 year ago

No

Looks like there is incompatibility with django 4.0+. I'll have a look

I also tried to specify Django v3 and downgraded the respective libraries, it did not help.

It is strange. It looks like registering operations are not called when you call create method. Haven't you redeclared create method inside Manager? Can you show me results of:

from django_clickhouse.models import *

print('Model is instance of ClickHouseSyncModel', isinstance(Test, ClickHouseSyncModel), Test.__mro__)
print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)
print('QuerySet is instance of ClickHouseSyncQuerySetMixin', isinstance(Test.objects, ClickHouseSyncQuerySetMixin))

No, I didn't redefine it, I didn't see it in the documentation. Console output below:

In [1]: from django_clickhouse.models import *

In [2]: from src.core.models import Test

In [3]: print('Model is instance of ClickHouseSyncModel', isinstance(Test, ClickHouseSyncModel), Test.__mro__)
Model is instance of ClickHouseSyncModel False (<class 'src.core.models.Test'>, <class 'django_clickhouse.models.ClickHouseSyncModel'>, <class 'django.db.models.base.Model'>, <class 'object'>)

In [4]: print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)

AttributeError: 'ClickHouseSyncManager' object has no attribute '__mro__'

In [5]: print('QuerySet is instance of ClickHouseSyncQuerySetMixin', isinstance(Test.objects, ClickHouseSyncQuerySetMixin))
QuerySet is instance of ClickHouseSyncQuerySetMixin False
M1ha-Shvn commented 1 year ago

What is the platform you are working on? Do django signals work properly there? Especially post_save and post_delete

ValentinDevPy commented 1 year ago

Model is instance of ClickHouseSyncModel False (<class 'src.core.models.Test'>, <class 'django_clickhouse.models.ClickHouseSyncModel'>, <class 'django.db.models.base.Model'>, <class 'object'>)

I'm working on a mac with m1 but same behavior on test server with ubuntu and x86. Yeah, django-signals work well on my platform.

M1ha-Shvn commented 1 year ago

create method registers operation here. I don't see any reasons it would not be called, if signals work fine. Do you have ability to debug, if this handler is called correctly? P. s. I use this libarary in my production environment, based on django 3.2 on Ubuntu and have no any problems like that. Tests also pass well on django 3.2. So I'm not sure how can I reproduce it myself.

ValentinDevPy commented 1 year ago

No

Looks like there is incompatibility with django 4.0+. I'll have a look

I also tried to specify Django v3 and downgraded the respective libraries, it did not help.

It is strange. It looks like registering operations are not called when you call create method. Haven't you redeclared create method inside Manager? Can you show me results of:

from django_clickhouse.models import *

print('Model is instance of ClickHouseSyncModel', isinstance(Test, ClickHouseSyncModel), Test.__mro__)
print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)
print('QuerySet is instance of ClickHouseSyncQuerySetMixin', isinstance(Test.objects, ClickHouseSyncQuerySetMixin))

No, I didn't redefine it, I didn't see it in the documentation. Console output below:

In [1]: from django_clickhouse.models import *

In [2]: from src.core.models import Test

In [3]: print('Model is instance of ClickHouseSyncModel', isinstance(Test, ClickHouseSyncModel), Test.__mro__)
Model is instance of ClickHouseSyncModel False (<class 'src.core.models.Test'>, <class 'django_clickhouse.models.ClickHouseSyncModel'>, <class 'django.db.models.base.Model'>, <class 'object'>)

In [4]: print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 print('Manager is instance of ClickHouseSyncManager', isinstance(Test.objects, ClickHouseSyncManager), Test.objects.__mro__)

AttributeError: 'ClickHouseSyncManager' object has no attribute '__mro__'

In [5]: print('QuerySet is instance of ClickHouseSyncQuerySetMixin', isinstance(Test.objects, ClickHouseSyncQuerySetMixin))
QuerySet is instance of ClickHouseSyncQuerySetMixin False

These commands were executed on django3.

M1ha-Shvn commented 1 year ago

I've written them without testing. First one is my fault, there should be isinstance(Test(), ClickHouseSyncModel). But I see from MRO that it will return True. Second is strange for me, but the error sais that manager is 'ClickHouseSyncManager' object This is the thing I wanted to check. So it's ok. Third is False, because objects is a manager, not QuerySet. Also myh fault

ValentinDevPy commented 1 year ago

create method registers operation here. I don't see any reasons it would not be called, if signals work fine. Do you have ability to debug, if this handler is called correctly? P. s. I use this libarary in my production environment, based on django 3.2 on Ubuntu and have no any problems like that. Tests also pass well on django 3.2. So I'm not sure how can I reproduce it myself.

Could you suggest how can I check this?

M1ha-Shvn commented 1 year ago
  1. You can run tests on your machine and environment. Here is the guide. |
  2. If it shows nothing, the only way I see - is debugging your app. You can use an IDE like PyCharm or VSCode, connect to your project or python console in debug mode, place breakpoint here and check if post_save has been emitted. If not you can debug all create procedure in order to understand why it has not been called.