WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.58k stars 1.02k forks source link

Managing strings is very slow in a large component #6207

Closed burner1024 closed 2 years ago

burner1024 commented 3 years ago

Describe the issue Using new "Manage strings" permission, I delete strings from components. Small ones work reasonably well, large ones take unreasonably long time.

uwsgi stderr | [2021-06-19 16:48:23,012: INFO/2462] fallout/fixt/es: processing po/spanish.po, content changed
...
uwsgi stderr | [2021-06-19 16:51:57,101: INFO/2462] fallout/fixt/es: updating completed

Adding is even worse... uwsgi is crunching something for 15 minutes already and no result. My POs are a few MBs.

I already tried Nothing comes to mind, really. I searched for a large component with "manage strings" enabled on Hosted to compare and see if it's the same, but didn't find any.

To Reproduce the issue Delete/add a string from/to a large PO component.

Expected behavior String deleted/added quickly.

Server configuration and status Using Docker 4.7-1 with pretty much default config.

$ docker-compose exec --user weblate weblate weblate list_versions
 * Weblate: 4.7
 * Django: 3.2.4
 * siphashc: 2.1
 * translate-toolkit: 3.3.6
 * lxml: 4.6.3
 * Pillow: 8.2.0
 * bleach: 3.3.0
 * python-dateutil: 2.8.1
 * social-auth-core: 4.1.0
 * social-auth-app-django: 4.0.0
 * django-crispy-forms: 1.12.0
 * oauthlib: 3.1.1
 * django-compressor: 2.4.1
 * djangorestframework: 3.12.4
 * django-filter: 2.4.0
 * django-appconf: 1.0.4
 * user-agents: 2.2.0
 * filelock: 3.0.12
 * setuptools: 40.8.0
 * jellyfish: 0.8.2
 * openpyxl: 3.0.7
 * celery: 5.1.0
 * kombu: 5.1.0
 * translation-finder: 2.9
 * weblate-language-data: 2021.5
 * html2text: 2020.1.16
 * pycairo: 1.16.2
 * pygobject: 3.30.4
 * diff-match-patch: 20200713
 * requests: 2.25.1
 * django-redis: 5.0.0
 * hiredis: 2.0.0
 * sentry_sdk: 1.1.0
 * Cython: 0.29.23
 * misaka: 2.1.1
 * GitPython: 3.1.14
 * borgbackup: 1.1.16
 * pyparsing: 2.4.7
 * pyahocorasick: 1.4.2
 * python-redis-lock: 3.7.0
 * Python: 3.7.3
 * Git: 2.20.1
 * psycopg2: 2.9
 * psycopg2-binary: 2.8.6
 * phply: 1.2.5
 * chardet: 4.0.0
 * ruamel.yaml: 0.17.9
 * tesserocr: 2.5.1
 * akismet: 1.1
 * boto3: 1.17.96
 * zeep: 4.0.0
 * aeidon: 1.9
 * iniparse: 0.5
 * mysqlclient: 2.0.3
 * Mercurial: 5.8
 * git-svn: 2.20.1
 * git-review: 2.1.0
 * Redis server: 6.2.4
 * PostgreSQL server: 13.3
 * Database backends: django.db.backends.postgresql
 * Cache backends: default:RedisCache, avatar:FileBasedCache
 * Email setup: django.core.mail.backends.smtp.EmailBackend: 127.0.0.1
 * OS encoding: filesystem=utf-8, default=utf-8
 * Celery: redis://cache:6379/1, redis://cache:6379/1, regular
 * Platform: Linux 5.8.0-55-generic (x86_64)

Weblate deploy checks

SystemCheckError: System check identified some issues:

CRITICALS:
?: (weblate.E003) Cannot send e-mail ([Errno 111] Connection refused), please check EMAIL_* settings.
    HINT: https://docs.weblate.org/en/weblate-4.7/admin/install.html#out-mail
?: (weblate.E027) The path /app/data/secret is owned by a different user, check your DATA_DIR settings.
    HINT: https://docs.weblate.org/en/weblate-4.7/admin/install.html#file-permissions

WARNINGS:
?: (security.W004) You have not set a value for the SECURE_HSTS_SECONDS setting. If your entire site is served only over SSL, you may want to consider setting a value and enabling HTTP Strict Transport Security. Be sure to read the documentation first; enabling HSTS carelessly can cause serious, irreversible problems.
?: (security.W008) Your SECURE_SSL_REDIRECT setting is not set to True. Unless your site should be available over both SSL and non-SSL connections, you may want to either set this setting True or configure a load balancer or reverse-proxy server to redirect all connections to HTTPS.
?: (security.W012) SESSION_COOKIE_SECURE is not set to True. Using a secure-only session cookie makes it more difficult for network traffic sniffers to hijack user sessions.
?: (security.W018) You should not have DEBUG set to True in deployment.

INFOS:
?: (weblate.I021) Error collection is not set up, it is highly recommended for production use
    HINT: https://docs.weblate.org/en/weblate-4.7/admin/install.html#collecting-errors
?: (weblate.I028) Backups are not configured, it is highly recommended for production use
    HINT: https://docs.weblate.org/en/weblate-4.7/admin/backup.html

System check identified 8 issues (1 silenced)
nijel commented 3 years ago

We have already some improvements ready for 4.7.1 which should improve the situation. In our tests it reduced the processing time to 60-70%, but it depends on many aspects.

burner1024 commented 3 years ago

I tried bleeding just now:

(small component)
weblate_1   | uwsgi stderr | [2021-06-21 13:30:36,719: INFO/387] fallout/fo2tweaks/es: processing data/text/po/spanish.po, content changed
weblate_1   | uwsgi stderr | [2021-06-21 13:30:37,304: INFO/387] fallout/fo2tweaks/es: updating completed

(large component)
uwsgi stderr | [2021-06-21 13:34:46,663: INFO/374] fallout/fixt/es: processing po/spanish.po, content changed
...
uwsgi stderr | [2021-06-21 13:40:43,086: INFO/374] fallout/fixt/es: updating completed

Browser is blocked for all the processing time, so... 5 or 3 or even 1 minute is effectively the same - not usable.

It seems to spend most of the time in SQL, I sampled some queries:

SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1432484
UPDATE "trans_unit" SET "translation_id" = 64, "id_hash" = 1475286454103327925, "location" = 'data/text/english/dialog/palguard.msg:103, data/text/english/dialog/pcargrd.msg:103', "context" = '', "note" = '', "flags" = '', "source" = 'What a nice day. I hope we don''t have any attacks today.', "previous_source" = '', "target" = 'Hace un día precioso. Espero que hoy no nos ataquen.', "state" = 20, "original_state" = 20, "details" = '{}', "position" = 14051, "num_words" = 12, "priority" = 100, "pending" = false, "timestamp" = '2017-08-28T21:30:59+00:00'::timestamptz, "extra_flags" = '', "explanation" = '', "variant_id" = NULL, "source_unit_id" = 1427001 WHERE "trans_unit"."id" = 469669
UPDATE "trans_unit" SET "translation_id" = 64, "id_hash" =  -5770396855701994789, "location" = 'data/text/english/game/combatai.msg:7013', "context" = '', "note" = '', "flags" = '', "source" = 'Let''s flatline you!', "previous_source" = '', "target" = '¡Vamos a ponerte a raya!', "state" = 20, "original_state" = 20, "details" = '{}', "position" = 21114, "num_words" = 3, "priority" = 100, "pending" = false, "timestamp" = '2017-08-28T21:32:51+00:00'::timestamptz, "extra_flags" = '', "explanation" = '', "variant_id" = NULL, "source_unit_id" = 1433689 WHERE "trans_unit"."id" = 476732
SELECT "trans_unit"."id", "trans_unit"."translation_id", "trans_unit"."id_hash", "trans_unit"."location", "trans_unit"."context", "trans_unit"."note", "trans_unit"."flags", "trans_unit"."source", "trans_unit"."previous_source", "trans_unit"."target", "trans_unit"."state", "trans_unit"."original_state", "trans_unit"."details", "trans_unit"."position", "trans_unit"."num_words", "trans_unit"."priority", "trans_unit"."pending", "trans_unit"."timestamp", "trans_unit"."extra_flags", "trans_unit"."explanation", "trans_unit"."variant_id", "trans_unit"."source_unit_id" FROM "trans_unit" WHERE "trans_unit"."id" = 1435741 LIMIT 21
UPDATE "trans_unit" SET "location" = 'dialog/dcjoey.msg:241', "note" = '', "flags" = '', "position" = 5429 WHERE "trans_unit"."id" = 1367982
SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1374356

Could it be that after a string is added or deleted, Weblate goes on to recalculate unit ids, and whatever is based on it, and does that one by one?

nijel commented 3 years ago

Bleeding probably still doesn't have the changes, tomorrow one should have them

burner1024 commented 3 years ago

Built one myself, tried it:

uwsgi stderr | [2021-06-21 15:05:03,964: INFO/404] fallout/fixt/es: processing po/spanish.po, content changed
uwsgi stderr | [2021-06-21 15:08:28,525: INFO/404] fallout/fixt/es: updating completed
nijel commented 3 years ago

I doubt it's actually bleeding. If you just build https://github.com/WeblateOrg/docker/ you will get what is published as edge. To actually get bleeding version, you should adjust the Docker file by https://github.com/WeblateOrg/docker/blob/main/.github/bin/bleeding

burner1024 commented 3 years ago

yes, I did use bleeding script

burner1024 commented 3 years ago

Added dogslow:

root@a4e772dbdd31:/# cat /tmp/dog/slow_request_m0remabc.log 
Undead request intercepted at: 21-06-2021 15:51:38 UTC

POST http://localhost:8080/unit/237033/delete/
Hostname:   a4e772dbdd31
Thread ID:  140409198516096
Process ID: 391
Started:    21-06-2021 15:51:13 UTC

  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/wsgi.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 130, in get_response
    response = self._middleware_chain(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/home/weblate/.local/lib/python3.7/site-packages/dogslow/__init__.py", line 313, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 87, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 72, in __call__
    return self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/accounts/middleware.py", line 74, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/accounts/middleware.py", line 156, in __call__
    return self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/api/middleware.py", line 26, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 200, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/http.py", line 40, in inner
    return func(request, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/views/edit.py", line 938, in delete_unit
    unit.translation.delete_unit(request, unit)
  File "/usr/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/translation.py", line 1323, in delete_unit
    self.handle_store_change(request, user, previous_revision)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/translation.py", line 1179, in handle_store_change
    self.check_sync(request=request, change=change)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/translation.py", line 382, in check_sync
    for unit in self.unit_set.prefetch_bulk().select_for_update()
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 280, in __iter__
    self._fetch_all()
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 1326, in _fetch_all
    self._prefetch_related_objects()
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 813, in _prefetch_related_objects
    prefetch_related_objects(self._result_cache, *self._prefetch_related_lookups)
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 1734, in prefetch_related_objects
    level,
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/query.py", line 1876, in prefetch_one_level
    rel_attr_val = rel_obj_attr(rel_obj)
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/fields/related.py", line 650, in get_foreign_related_value
    return self.get_instance_value_for_fields(instance, self.foreign_related_fields)
  File "/usr/local/lib/python3.7/dist-packages/django/db/models/fields/related.py", line 656, in get_instance_value_for_fields
    for field in fields:

This report does not contain the local stack variables.
To enable this (very verbose) information, add this to your Django settings:
  DOGSLOW_STACK_VARS = True
nijel commented 3 years ago

This looks pretty much as expected - Weblate rescans the file after removing the string and this select is the most expensive part of the update.

burner1024 commented 3 years ago

So... are you saying that this performance is expected? Or you can't reproduce it?

I enabled query logging for postgres, apparently deleting a string results in ~50000 SELECT + 50000 UPDATE queries.

$ #before
$ docker logs wl_database_1 2>&1 | grep "LOG:  statement" | awk '{print $7}' | sort | uniq -c
      5 
    863 BEGIN
    614 CLOSE
    862 COMMIT
    614 DECLARE
     13 DELETE
   1226 FETCH
    861 INSERT
   2306 SELECT
      3 UPDATE
$ #after
$ docker logs wl_database_1 2>&1 | grep "LOG:  statement" | awk '{print $7}' | sort | uniq -c
      5 
    863 BEGIN
    614 CLOSE
    862 COMMIT
    614 DECLARE
     33 DELETE
   1226 FETCH
    863 INSERT
      1 ROLLBACK
  51667 SELECT
      1 SHOW
  49324 UPDATE
nijel commented 3 years ago

All strings present in the file after the deleted one will be updated, so the number of updates is expected. It's this query:

UPDATE "trans_unit" SET "location" = 'dialog/dcjoey.msg:241', "note" = '', "flags" = '', "position" = 5429 WHERE "trans_unit"."id" = 1367982

There should not be that many of select queries though. Maybe it's the variants causing this as this query should be prefetched at once and not executed for every unit:

SELECT "trans_variant"."id", "trans_variant"."component_id", "trans_variant"."variant_regex", "trans_variant"."key" FROM "trans_variant" INNER JOIN "trans_variant_defining_units" ON ("trans_variant"."id" = "trans_variant_defining_units"."variant_id") WHERE "trans_variant_defining_units"."unit_id" = 1374356
burner1024 commented 3 years ago

So, it's like I thought. But, adding a variant does not result in "processing po, content changed", so maybe deleting also needn't?

I did some more tests:

So, reads and writes are likely doubled somewhere. But even if that is fixed, working with large components is impossible. I guess deletion should be offloaded? In fact, in PO terms strings probably should not be deleted from the file, simply marked as "obsolete". Still, 4 minutes is a lot even for a background process. For an instance with, say, 100 translations, it's not hard to imagine that for example, 10 strings per hour get deleted. If each one takes a few minutes, they are likely to block each other and/or compound db load exponentially (not to mention disrupting translator's workflow).

Although in fact, I'm more interested in adding than deleting. The problem is that while adding to small POs also works fine, with large ones I have even less luck: didn't have the patience to get at least one variant added. It's just loading forever. Here's a relevant dogslow (at 60 seconds):

Undead request intercepted at: 22-06-2021 10:40:45 UTC

POST http://localhost:8080/new-unit/fallout/fixt/ru/
Hostname:   234c659c9c07
Thread ID:  140054938601344
Process ID: 404
Started:    22-06-2021 10:39:45 UTC

  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/wsgi.py", line 133, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 130, in get_response
    response = self._middleware_chain(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/dogslow/__init__.py", line 313, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 87, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 72, in __call__
    return self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/accounts/middleware.py", line 74, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/utils/deprecation.py", line 117, in __call__
    response = response or self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/accounts/middleware.py", line 156, in __call__
    return self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/api/middleware.py", line 26, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/weblate/middleware.py", line 200, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.7/dist-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/django/views/decorators/http.py", line 40, in inner
    return func(request, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/views/edit.py", line 919, in new_unit
    if not form.is_valid():
  File "/usr/local/lib/python3.7/dist-packages/django/forms/forms.py", line 175, in is_valid
    return self.is_bound and not self.errors
  File "/usr/local/lib/python3.7/dist-packages/django/forms/forms.py", line 170, in errors
    self.full_clean()
  File "/usr/local/lib/python3.7/dist-packages/django/forms/forms.py", line 373, in full_clean
    self._clean_form()
  File "/usr/local/lib/python3.7/dist-packages/django/forms/forms.py", line 400, in _clean_form
    cleaned_data = self.clean()
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/forms.py", line 1995, in clean
    self.translation.validate_new_unit_data(**data)
  File "/usr/local/lib/python3.7/dist-packages/weblate/trans/models/translation.py", line 1406, in validate_new_unit_data
    current.source != store.content_units[pos].source
  File "/usr/local/lib/python3.7/dist-packages/weblate/formats/base.py", line 347, in content_units
    return [unit for unit in self.all_units if unit.has_content()]
  File "/usr/local/lib/python3.7/dist-packages/weblate/formats/base.py", line 347, in <listcomp>
    return [unit for unit in self.all_units if unit.has_content()]
  File "/usr/local/lib/python3.7/dist-packages/weblate/formats/ttkit.py", line 134, in has_content
    and not self.mainunit.isobsolete()
  File "/usr/local/lib/python3.7/dist-packages/translate/storage/pypo.py", line 568, in isblank
    return False
  File "/usr/local/lib/python3.7/dist-packages/translate/storage/pypo.py", line 458, in _msgidlen
    return len(unquotefrompo(self.msgid))
  File "/usr/local/lib/python3.7/dist-packages/translate/storage/pypo.py", line 652, in hasplural
    def hasplural(self):

This report does not contain the local stack variables.
To enable this (very verbose) information, add this to your Django settings:
  DOGSLOW_STACK_VARS = True

I see only one SQL query in PG activity, but it's in idle state for 10 minutes already:

 weblate | 2021-06-22 10:39:47.499332+00 | SELECT "addons_addon"."id", "addons_addon"."component_id", "addons_addon"."name", "addons_addon"."configuration", "addons_addon"."state", "addons_addon"."project_scope", "addons_addon"."repo_scope" FROM "addons_addon" INNER JOIN "trans_component" ON ("addons_addon"."component_id" = "trans_component"."id") WHERE (("addons_addon"."component_id" = 16 AND NOT "addons_addon"."project_scope") OR ("trans_component"."project_id" = 13 AND "addons_addon"."project_scope") OR ("trans_component"."linked_component_id" = 16 AND "addons_addon"."repo_scope") OR ("addons_addon"."component_id" IS NULL AND "addons_addon"."repo_scope"))
nijel commented 3 years ago

So, reads and writes are likely doubled somewhere.

There is always the source string which needs to be updated as well...

large component (25k strings)

* deleting last one: ~25 seconds, ~120 SELECT, 6 UPDATE

* deleting first one: ~4 minutes, 50k SELECT, 50k UPDATE

I've made updating of the units positions in a single query in 4976d8b2638acdd0a4893a49d5cf476112e9065b, that should make the removal same speed regardless the actual location of the string in the file.

Although in fact, I'm more interested in adding than deleting.

That should be way faster with 277150d903.

burner1024 commented 3 years ago

I built https://github.com/WeblateOrg/weblate/commit/277150d903b5cdc66a053e6282eee9ab30d7fbd0

nijel commented 3 years ago

Is the deleting slow when executed after adding a variant or even without that (for example after another deleting or after importing the translation)? Looking at the code, adding a variant might mess up this...

nijel commented 3 years ago

Also make sure there are no pending changes prior doing removal, otherwise you are measuring commit + delete.

burner1024 commented 3 years ago

I tried deleting first, it was before adding (cold start). With https://github.com/WeblateOrg/weblate/commit/9a19b7dd112a34794485248c75c14c1e03382784

nijel commented 3 years ago

deleting: 1 minute, 25k UPDATE now

I can't reproduce this. Can you please share your component configuration (especially file format, new base and monolingual base)? Is this newly imported translation or something migrated from an older Weblate version?

Adding variant should perform better with bc1875819c8181022f462e05fc8d8b8e7b32bd53.

burner1024 commented 3 years ago

Still 25k on deleteion with https://github.com/WeblateOrg/weblate/commit/bc1875819c8181022f462e05fc8d8b8e7b32bd53 . Weblate upgraded from 3.1.1. PO format, bilingual, no base.

Adding triggers only one set of UPDATEs, but still all of them 1 by 1, feels pretty slow:

nginx stdout | 172.18.0.1 - - [23/Jun/2021:10:22:14 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 718|app: 0|req: 34/77] 172.18.0.1 () {64 vars in 1314 bytes} [Wed Jun 23 10:22:02 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 12107 msecs (HTTP/1.1 302) 8 headers in 405 bytes (2 switches on core 0)
uwsgi stderr | [pid: 560|app: 0|req: 35/78] 172.18.0.1 () {58 vars in 1214 bytes} [Wed Jun 23 10:22:14 2021] GET /translate/fallout/fixt/ru/?checksum=ed7c05a381666b3a => generated 72982 bytes in 1965 msecs (HTTP/1.1 200) 7 headers in 337 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [23/Jun/2021:10:22:16 +0000] "GET /translate/fallout/fixt/ru/?checksum=ed7c05a381666b3a HTTP/1.1" 200 17470 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
nijel commented 3 years ago

Do you always start from the fresh container? In the past, the position of source strings was not properly tracked, so the first deletion would fill these in and that could explain the number of queries we see. In case you do the removal repeatedly on same database, it should not trigger that much of updates on consequent runs...

burner1024 commented 3 years ago

The same database, only weblate container is re-created. (PO here if you want to test)

nijel commented 3 years ago

Do you have configured "Template for new translations"?

burner1024 commented 3 years ago

Yes.

component config

nijel commented 3 years ago

6474d1b4affb79f4a79a568d7de64db5824d41ce should address the 50k update queries.

burner1024 commented 3 years ago

It's 25k now, and unfortunately still there for me.

More importantly, adding appears to perform worse and worse with each new variant added:

nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:39:26 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 408|app: 0|req: 11/35] 172.18.0.1 () {64 vars in 1314 bytes} [Thu Jun 24 13:39:18 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 7927 msecs (HTTP/1.1 302) 8 headers in 405 bytes (2 switches on core 0)
uwsgi stderr | [pid: 407|app: 0|req: 5/36] 172.18.0.1 () {58 vars in 1214 bytes} [Thu Jun 24 13:39:26 2021] GET /translate/fallout/fixt/ru/?checksum=8c6367f7b9de0bdc => generated 76585 bytes in 1503 msecs (HTTP/1.1 200) 7 headers in 337 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:39:27 +0000] "GET /translate/fallout/fixt/ru/?checksum=8c6367f7b9de0bdc HTTP/1.1" 200 17345 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:42:03 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 408|app: 0|req: 12/43] 172.18.0.1 () {64 vars in 1314 bytes} [Thu Jun 24 13:41:55 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 8071 msecs (HTTP/1.1 302) 8 headers in 405 bytes (2 switches on core 0)
uwsgi stderr | [pid: 406|app: 0|req: 3/44] 172.18.0.1 () {58 vars in 1214 bytes} [Thu Jun 24 13:42:03 2021] GET /translate/fallout/fixt/ru/?checksum=86aa3b2ac3297c80 => generated 72428 bytes in 1001 msecs (HTTP/1.1 200) 7 headers in 337 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:42:04 +0000] "GET /translate/fallout/fixt/ru/?checksum=86aa3b2ac3297c80 HTTP/1.1" 200 16620 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

uwsgi stderr | [pid: 407|app: 0|req: 6/46] 172.18.0.1 () {64 vars in 1314 bytes} [Thu Jun 24 13:42:25 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 8462 msecs (HTTP/1.1 302) 8 headers in 405 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:42:33 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 555|app: 0|req: 24/47] 172.18.0.1 () {58 vars in 1214 bytes} [Thu Jun 24 13:42:33 2021] GET /translate/fallout/fixt/ru/?checksum=e3e0dc99c1356b16 => generated 72423 bytes in 1202 msecs (HTTP/1.1 200) 7 headers in 337 bytes (2 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:42:34 +0000] "GET /translate/fallout/fixt/ru/?checksum=e3e0dc99c1356b16 HTTP/1.1" 200 16595 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:43:12 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 407|app: 0|req: 7/49] 172.18.0.1 () {64 vars in 1314 bytes} [Thu Jun 24 13:43:04 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 8816 msecs (HTTP/1.1 302) 8 headers in 405 bytes (1 switches on core 0)
uwsgi stderr | [pid: 406|app: 0|req: 4/50] 172.18.0.1 () {58 vars in 1214 bytes} [Thu Jun 24 13:43:12 2021] GET /translate/fallout/fixt/ru/?checksum=22468efe09bc1284 => generated 72487 bytes in 970 msecs (HTTP/1.1 200) 7 headers in 337 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:43:13 +0000] "GET /translate/fallout/fixt/ru/?checksum=22468efe09bc1284 HTTP/1.1" 200 16617 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

uwsgi stderr | [pid: 408|app: 0|req: 13/52] 172.18.0.1 () {64 vars in 1314 bytes} [Thu Jun 24 13:43:33 2021] POST /new-unit/fallout/fixt/ru/ => generated 0 bytes in 9335 msecs (HTTP/1.1 302) 8 headers in 405 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:43:42 +0000] "POST /new-unit/fallout/fixt/ru/ HTTP/1.1" 302 5 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
uwsgi stderr | [pid: 406|app: 0|req: 5/53] 172.18.0.1 () {58 vars in 1214 bytes} [Thu Jun 24 13:43:42 2021] GET /translate/fallout/fixt/ru/?checksum=c06009a25ac88c87 => generated 72480 bytes in 1077 msecs (HTTP/1.1 200) 7 headers in 337 bytes (1 switches on core 0)
nginx stdout | 172.18.0.1 - - [24/Jun/2021:13:43:44 +0000] "GET /translate/fallout/fixt/ru/?checksum=c06009a25ac88c87 HTTP/1.1" 200 16614 "http://localhost:8080/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
nijel commented 3 years ago

I did several tests with PO file you've referenced, and I cannot get the 25k queries on removal - it always stays within a sane range around 50.

burner1024 commented 3 years ago

Regardless of the number of queries, what times are you getting?

cdd78b029632fa726ed0de754e05827d60ee0611 results:

If it takes so long, translators just won't use it. And considering that adding seems to perform worse with each variant... for now there's only 30, but there will be thousands. In order to be usable, both actions would need to take the same time as saving a translation - under 1 sec, max 2.

Edit: and I take back what I said earlier about not worrying about deleting much. Both adding and deleting are important.

nijel commented 2 years ago

The variants performance should be addressed by a5536ec5e572d34690dec56e64b4010e74652ba2, 4b84ca00fce6c2537548ee1f806e960177e40f59, and ba53c0bcbd5aeb64b3cc4ae33073d9d8c2ff07f0.

github-actions[bot] commented 2 years ago

Thank you for your report; the issue you have reported has just been fixed.