Add centralized error logging and bug reporting

schlessera commented 5 years ago

When unexpected things happen on the backend (during sanitization, for example), the method of choice in PHP should be to throw an exception for such an exceptional event. However, a thrown exception always need to be either handled where the error condition can be resolved or caught in a central handler that logs & dispatches as needed.

For the exceptions we cannot resolve from within the plugin (for example, an assert failing to assert an assumption we made), we should have such a central handler to fall back on.

Approach

Here's my initial suggestion of to go about building something like this.

a.) We should have access to a logger. It is not important what logger that is, as long as we can replace both "output error messages" and "take the server down" with "just log what happened and go on". I suggest making use of the PSR-3 logger interface for consumption, and then providing a filterable basic implementation that just logs to a file somewhere. This allows more complex sites to hoist the logging into their bigger architecture as needed. As an example, with a simple bridge class, you could then have critical exceptions be posted on your company's Slack so developers can immediately react.

b.) The logger should ideally be injected through setter interjection. We don't want to block the constructors, and it should be as easy and transparent and possible in usage. Basic support could be as simple as adding the line use Logging; in that case, with a trait that takes care of the boilerplate.

c.) We should add exception catchers in strategic points in the code to handle/log them. At the very least, we will need one try/catch block that wraps the entire plugin, so that no exception can escape. This is needed because WordPress itself is blissfully unaware of exceptions, so they will very likely turn into a fatal error if they escape the plugin.

d.) We should integrate with the Site Health component to display problems that were identified and to extend the diagnostic information that can be copied from there. Users can then easily send diagnostic information that helps us fix bugs.

e.) We should add strategic debug information (let's not overdo it in the beginning) that is only logged when a certain flag is set. The PSR-3 logger interface has log levels that we can use for that.

Benefits

Once the infrastructure is in place, exceptions allow for a more controlled way fo dealing with error conditions and exceptional situations. Often times, after a bit of shuffling around, you end up with simpler, easier to maintain code because the "in-between" code does never need to include error-aware logic and a lot of checks can be turned into direct type failures.
Having logs that the user can send causes big reports to be more straight-forward, and avoids the usual back & forth between reporter and developer to figure what has actually happened and what the environment and context was.
Logs also open up a more indirect feedback loop, as right now we either bother the user or stay silent. With logs, there's multiple nuances in-between, so we'll be able to get more data with less intrusion.
We could provide an opt-in mechanism that sends anonymized logs to Google servers, for example to crowdsource detected sanitization issues.

Next Steps

I suggest starting with the following tasks to get the basic infrastructure ready so that all developers can get a feel for it and provide feedback:

1.) Integrate PSR-3: filterable implementation that fulfills the interface, trait that accepts the interface through a setter. 2.) Add logging to a few key components by using the above trait and sending the implementation through the setter while instantiating. 3.) Add a few meaningful log messages to these same components. 4.) Add a central, top-most exception catcher that just catches all exceptions/throwables and logs them to our logger. 5.) Use exceptions to exercise the central catcher.

schlessera commented 5 years ago

I was already suggesting to use a namespace for strategic redesigns. The logger being a central piece of architecture, I suggest starting with that one with a namespace (we're at PHP 5.4, so that's fine to require namespaces).

Draft trait:

<?php

namespace AMP;

use Psr\Log\LoggerInterface;

trait Logging {

    /**
     * Sets a logger instance on the object.
     *
     * @param LoggerInterface $logger
     */
    public function set_logger( LoggerInterface $logger ) {
        $this->logger = $logger;
    }
}

Object using the trait:

<?php

use AMP\Logging;

class My_Sanitizer extends AMP_Base_Sanitizer {

    use Logging;

    public function sanitize() {
        $this->logger->debug( 'In My_Sanitizer constructor now.' );
    }
}

Injecting the logger during instantiation:

/**
 * Sanitizer.
 *
 * @type AMP_Base_Sanitizer $sanitizer
 */
$sanitizer = new $sanitizer_class( $dom, array_merge( $args, $sanitizer_args ) );

if ( $sanitizer instanceof AMP\LoggerAware
     || method_exists( $sanitizer, 'set_logger') ) {
    $sanitizer->set_logger( $this->logger );
}

Note that we don't deal with constructor arguments here (developers are still free to use the constructor for whatever they want) and that we also don't require a full-blown injection container.

schlessera commented 4 years ago

In terms of user experience, it would be great to use a "Fingers-crossed"-logger. That type of logger collects logging information in memory only, unless an actual unrecoverable error occurs. In that case, it can dump the logging information surrounding that error. This allows us to always have lots of contextual information for bug reports while not filling up the disk space of users when not needed.

This "Fingers-crossed"-logger can be nicely combined with a bug report dialog that would show for the user, or a screen with recorded errors that the user can opt-in to send to us.

The reason why logging is so important is that the plugin does a lot of logic behind the scenes, and even makes remote requests. If there is an exotic error happening on a production site, we are in a very bad position right now to actually diagnose the root cause of that error. With logging, we can add a lot of contextual information to bug reports that make pinpointing the issue very easy.

ampproject / amp-wp