Open varunouat opened 4 years ago
There is a lot I don't know from this report, but in general: use separate Redis instances for sessions and cache, tune or disable locking based on your needs, etc.. Make sure to read the full README as it has a lot of info on tuning.
Are these two screenshots related to the same user? If so it may be that the clearing tags operation is locking the other process because it is waiting for a lock on the session (sleeping). The file backend is very fast unless you have slow disks or an inane number of tags for each cache record.
HI
We already have separate instance for Redis sessions for cache we are using litespeed and cloudflare I cannot confirm it is for on user or not. We are having following Redis config
'session' => [
'save' => 'redis',
'save_path' => '/home/onceuponatrunk/tmp/session',
'redis' => [
'host' => '127.0.0.1',
'port' => '6379',
'password' => '',
'timeout' => '2.5',
'persistent_identifier' => '',
'database' => '2',
'compression_threshold' => '2048',
'compression_library' => 'gzip',
'log_level' => '3',
'max_concurrency' => '6',
'break_after_frontend' => '5',
'break_after_adminhtml' => '30',
'first_lifetime' => '600',
'bot_first_lifetime' => '60',
'bot_lifetime' => '7200',
'disable_locking' => '0',
'min_lifetime' => '60',
'max_lifetime' => '2592000',
'sentinel_master' => '',
'sentinel_servers' => '',
'sentinel_connect_retries' => '5',
'sentinel_verify_master' => '0'
]
],
please help me in debugging this as we are loosing Revenue a lot.
If it is urgent I would recommend disabling locking entirely. I think that will alleviate the pains of the customers. Then you can take your time and diagnose the real issue. Disabling locking will only have minor side-effects like if the users adds two items to the cart very quickly only one page might show the success message.
Those yellow 500s in the screenshot are half-second sleeps which is how the locking mechanism works. If another process has the lock it sleeps for a half-second and then tries again. There is some reason that the user's session is locked so you need to find what that is but here are some common ones:
Actually it's pretty much gotta be one of those two.. If you have any PHP fatal errors fix those. I don't know if those are captured by newrelic because by nature of a fatal error the process is probably dieing before the error is reported to an external tool So you need to check PHP error log files to see if there are any errors.
The other thing to check for is if you have any long-running processes that are long-running for some reason other than the session locking. If you do then any user who's session is locked like this will experience every page load being slow because of the session lock wait timeout.
If disabling locking is too scary you could at least reduce the timeout so that the max time waited is a lot less than 5 seconds.
Hi Colin
I hope you are doing good.
We are still facing the same issue, looks like Session are getting locked out, but I cannot find any reason for the same, as there is no fatal error in the applications. this problem persists for each and every request.
Have you tried disabling locking?
No, I haven't disabled the locking as it may result in mixing up the data for customers. But we have separated the Redis instance for sessions and cache but still, there are few requests where it takes time.
I see in your stacktrace you are using elasticsearch so I'm guessing this is a rapid-fire type request for auto-complete? I'd definitely disable locking for this controller. Although ideally I think your controller code should avoid even using the session at all as search results are not user-specific (unless they are).
No this is a general category listing page. Yes, we are using the Elastic search for minimizing the search_tmp entries. But we have also implemented a separate instance for saving sessions and cache.
Hi
I hope you are in great health, If you have any free time so that we can discuss further on this issue, I am facing serious trouble in finding the issue, I request you if we can meet on skype or Anydesk where you can check the code and configurations. So we can resolve it.
I'm sorry, I do not have time to troubleshoot the issue for you. I can give you some hints though.. You need to determine for sure if the delay is caused by locking between multiple requests on the same session id or not. Try to find other requests that relate to the same few seconds where the delay occurs and see what is causing the delay in the request that holds the lock.
Again, as a quick fix if this is causing major problems just disable locking during peak hours so that your customers are not impacted by slow responses. The side effects of having no locking are minor compared to 5 and 10 second delays.
We are experiencing the same issue, the read is taking too long time. We have tried to disable the locking entirely and the performance are much better. Could you just explain how this locking mechanism is supposed to work. When disabled, what are the risks ?
Best regard, Pierre
@henry1303 It was already explained above.
This explains a lot: https://github.com/magento/magento2/issues/34758
HI
I am using AWS with CentOS 7, Litespeed webserver. Using Magento 2.3.3 and using Redis for sessions. We are facing a problem where the session read handler is taking too much time on our product detail and list page. We have attached the New relic report for the same. Can you please suggest us the modifications required to minimize the time.