I think I've found the root cause of RED-1698. Both LMS and Studio use the same STATICFILES_STORAGE = 'openedx.core.storage.ProductionStorage' which uses the same settings.CACHES['staticfiles']['KEY_PREFIX'] = 'prod-tahoe-edxapp-hawthorn-0_general' and that's probably causing cache collision between the two apps.
Why it doesn't happen all the time?
I don't know. My best guess that it's a combination of cache invalidation and cache collision.
What's the solution?
I have two methods that this PR should accomplish:
Use a different cache prefix for lms and cms (needs dynamic settings change at runtime)
Use a prefix for each deployment to effectively invalidate the old entries since Memcached doesn't support prefix-based deletion: {{ ansible_hostname|default('staticfiles') }}_{{ 65000 | random }}_general
Using random should also prevent lms/cms from reusing the same cache entry even on the same server.
Please prioritize quickly but review carefully
A customer is launching their site on Monday February 15th, 2021 and there's a bug that's blocking them. We cannot let this wait too much and ideally deploy on Monday. However, I don't want to skip proper review and make a bigger problem.
Example on a production server shell
# $ ssh server_name
# $ /edx/bin/edxapp-shell-cms
from django.conf import settings
settings.CACHES['staticfiles']['KEY_PREFIX'] = 'server_name_general'
from django.contrib.staticfiles.storage import staticfiles_storage
staticfiles_storage.url('js/i18n/ja-jp/djangojs.js')
result = '/static/studio/js/i18n/ja-jp/djangojs.47fbc849b48c.js' # 404 not found
# then try again on a _new_ shell
# $ ssh server_name
# $ /edx/bin/edxapp-shell-cms
from django.conf import settings
settings.CACHES['staticfiles']['KEY_PREFIX'] = 'RANDOM_PREFIX_server_name_general'
from django.contrib.staticfiles.storage import staticfiles_storage
staticfiles_storage.url('js/i18n/ja-jp/djangojs.js')
result_2 = '/static/studio/js/i18n/ja-jp/djangojs.baffe5deb7f4.js' # 200 found!
I think I've found the root cause of RED-1698. Both LMS and Studio use the same
STATICFILES_STORAGE = 'openedx.core.storage.ProductionStorage'
which uses the samesettings.CACHES['staticfiles']['KEY_PREFIX'] = 'prod-tahoe-edxapp-hawthorn-0_general'
and that's probably causing cache collision between the two apps.Why it doesn't happen all the time?
I don't know. My best guess that it's a combination of cache invalidation and cache collision.
What's the solution?
I have two methods that this PR should accomplish:
{{ ansible_hostname|default('staticfiles') }}_{{ 65000 | random }}_general
Using
random
should also prevent lms/cms from reusing the same cache entry even on the same server.Please prioritize quickly but review carefully
A customer is launching their site on Monday February 15th, 2021 and there's a bug that's blocking them. We cannot let this wait too much and ideally deploy on Monday. However, I don't want to skip proper review and make a bigger problem.
Example on a production server shell
See https://appsembler.atlassian.net/browse/RED-1698?focusedCommentId=31747