Log and analyze HTTP 500 errors

jonrkarr commented 4 years ago

It would helpful to log information about failures of the API (e.g., 500 errors) so we can periodically review this and debug any errors.

URI (endpoint and query arguments)
TImestamp
(I don't think we really need to log IP addresses since there's no authentication, cookies, etc.)
(I don't think we need to log the version of datanator_rest_api because the timestamp implies a version and _version.py isn't being updated.)

One way to do this is to replace application = create_app().app at the bottom of datanator_rest_api/core.py with the following. This will save log information to a file. However, the file will be lost each time the application is deployed. To persist the information, it would need to be saved to the MongoDB or another location.

import logging

# Setup logger
log_filename = os.path.expanduser('~/.wc/log/datanator_rest_api.log')
logger = logging.getLogger()
handler = logging.FileHandler(log_filename)
formatter = logging.Formatter(
    '%(asctime)s %(filename)s:%(lineno)d %(funcName)s %(levelname)-8s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)

def application(environ, start_response):
    """ Logged application

    Args:
        environ (:obj:`dict`): WSGI environment
        start_response (:obj:callable): callable accepting a status code,
            a list of headers, and an optional exception context to
            start the response.

    Returns:
        :obj:`object`: response
    """
    try:
        return create_app().app(environ, start_response)
    except Exception as error:
        logger.exception("URI: %s", environ['REQUEST_URI'])
        logger.exception("Error: %s", str(error))
        return []

Once you have Amazon EC2 setup, Amazon's error monitoring is probably a better option.

Heroku has error monitoring plugins as well. I don't recommend this because these aren't compatible with Amazon EC2.

One more option is to use a third-party cloud provider such as below. If Amazon's error logging isn't sufficient, I think its simpler to use that rather than involving yet another third-party product.

lzy7071 commented 4 years ago

Currently logs are streamed into Elasticsearch. I agree that explicitly handling 500 error is a good idea. I will do that in swagger files.

jonrkarr commented 4 years ago

I see that the Heroku stderr log already captures the information we need. However, the Heroku log is hard to read. It would be helpful to either:

Use Amazon's service to make the error information easier to understand OR
Write a script to (a) filter out the errors, (b) group the errors by endpoint, and (c) sort the endpoints by the frequency of their errors. This would tell us which endpoints need to be fixed and provide us with examples.

jonrkarr commented 4 years ago

Yes, I think we primarily need to focus on 500 errors as this includes errors due to issues with the datanator_rest_api code. I think we can largely ignore other error codes.

jonrkarr commented 4 years ago

Can Swagger log errors?

lzy7071 commented 4 years ago

Can Swagger log errors?

No, I don't think so. Regardless of error type, logs are formatted and stored in ES and analysis can be done by filtering out 500 errors. My intention to handle 500 errors on the Swagger side is primarily for the frontend, so that 500 errors won't be shown to the users. But in any case, I think using cloud service provider's, in our case, AWS's, logging service seems a better solution.

jonrkarr commented 4 years ago

The frontend already catches all 500 errors and displays a brief message to users in a modal window. Under development and test modes, the frontend also logs errors to the browser console; users won't see this because the frontend is deployed under production mode. I've been using this and the Chrome development tools to find errors. Once we have other people trying Datanator, an error log will help us find (and then fix) errors encountered by other people.

Yes, Amazon's error logging would be the right choice. Presumably Amazon makes it easy to sort and filter errors.

Sorry for the confusion about mentioning Google Cloud -- I was mixing up BioSimulations and Datanator. Bilal is using Google Cloud for BioSimulations.

lzy7071 commented 4 years ago

The frontend already catches all 500 errors and displays a brief message to users in a modal window. Under development and test modes, the frontend also logs errors to the browser console; users won't see this because the frontend is deployed under production mode. I've been using this and the Chrome development tools to find errors. Once we have other people trying Datanator, an error log will help us find (and then fix) errors encountered by other people.

Yes, Amazon's error logging would be the right choice. Presumably Amazon makes it easy to sort and filter errors.

Sorry for the confusion about mentioning Google Cloud -- I was mixing up BioSimulations and Datanator. Bilal is using Google Cloud for BioSimulations.

That's right. I forgot about the error would be on the browser console. No problem with the cloud service provider mixup.

lzy7071 commented 4 years ago

Closing because https://github.com/KarrLab/datanator_rest_api/issues/107 is done.

KarrLab / datanator_rest_api

Log and analyze HTTP 500 errors #115