Pierre-Lannoy / wp-decalog

Capture and log events, metrics and traces on your site. Make WordPress observable - finally!
https://perfops.one/
GNU General Public License v3.0
64 stars 10 forks source link

Improve Resilience for missing logger config even more (Follow up to #24 ) #30

Closed JanThiel closed 1 year ago

JanThiel commented 2 years ago

Is your feature request related to a problem? Please describe. Hey @Pierre-Lannoy :-) As a follow up to https://github.com/Pierre-Lannoy/wp-decalog/issues/24 we took some time to evaluate the solution implemented there. In general it does what it should and prevents any exceptions from happening. But the current implementation leads to resetting the configured loggers whenever the failsafe triggers. It happens surprisingly often whenever an attached object cache might need some time to flush. So the value does not return correctly for <1s. But that's enough to trigger the failsafe.

Describe the solution you'd like We propose a simple solution. By removing the option_save calls all the failsafe and default logic will happen only in memory. Unless someone actually changes the logger config manually and thus stores the new values intentionally. This will prevent unwanted resets of the actually configured loggers.

Describe alternatives you've considered We considered persistent default loggers configured within the wp-config.php as constants. This would require PHP 7+ and knowledge of the internal data structure of a logger config. As one would have to duplicate it into the config. This default logger config could then be merged into the loggers object if missing. As you do already with the shared memory logger. This solution would require more code then the current approach ;-) But it would also be a possible solution for this.

Additional context 🌻

Pierre-Lannoy commented 2 years ago

Hello @JanThiel ! Sorry for this very late answer… I am currently unpacking all issues and PR on all my plugins to prepare to WP 6.1 release.

I've tested your PR (thanks, thanks, thanks 🙏🏼 ) on my test environments and so far so good. Nevertheless, I had not the initial issue you have. Could you confirm it fixes this type of issue? Are you currently using this fix?

Pierre-Lannoy commented 2 years ago

Hello @JanThiel ! I'm about to release the new version and would like to validate your PR before. Could you confirm you're using this fix and it fixes the issue? Thanks.

JanThiel commented 2 years ago

Hi @Pierre-Lannoy I've been on vacation and still cleaning up the "mess" vacations tend to leave within your inbox ;-). So sorry for the late reply!

We are running the branch live and haven't had any issues within the last 3 weeks. But personally I would as well considering punting this to a later release of your plugin. Although the code mainly adds safety checks before accessing assumed existing array keys, the removal of the update_option is kind of a major change. It shouldn't have any negative effects, but I haven't run that many test-cases. Except of our production workload ;-).

I believe my current approach is kind of a hotfix from what I saw within the decalog code. It works but is not a complete fix.

A much better approach would be to add the changes to a global helper method that returns all the loggers and is used everywhere where loggers are fetched. That would allow a single centralized approach making sure that the returned data structure is correct in any way. Currently the way to get the loggers is kind of fractured. Some places call the check_logger method and others directly access the option.

What do you think about this?

I am as well unsure why we do have these issues in the first place. We know that the Object Cache infrastructure we use utilising redis and the "object-cache-pro" plugin tends to be fragile and creating inconsistent states. Like the ones causing the issues with partly or wrong serialized logger objects and thus causing the fatal errors. But we still don't know why this happens :-/ Quite a massive PITA... When you do not have these risks or states, the current decalog code is fine. But any wrong data in the db or cache and the checks in the code are not sufficient.

Just so you understand my thoughts.

Pierre-Lannoy commented 2 years ago

Hi Jan! OK. I understand well and I will postpone it until we have a clear idea of how to do it more "globally". Nevertheless, this means that it will be a problem for you when you update the plugin : you will have to "reproduce" your fix in your production environment. If it's ok for you, it's ok for me 😉 Just let me know…

I've been on vacation and still cleaning up the "mess" vacations tend to leave within your inbox ;-)

Haha, do not apologize. Glad you were able to take vacations…

JanThiel commented 2 years ago

Go ahead with the next release and I will refactor this PR to be more complete for the next release. You might want to cherry-pick some commits anyway for this release:

https://github.com/Pierre-Lannoy/wp-decalog/pull/31/commits/3f00ebca5b5fced80399a8a6d078aeb5f6935f35 https://github.com/Pierre-Lannoy/wp-decalog/pull/31/commits/a78bc00b79520e5a9190d4f0ee8d7ae0e2b85567

Pierre-Lannoy commented 2 years ago

So, I've done it 😉