Open ccostino opened 4 months ago
Here is some thoughts for improving logging & error handling
Certainly! Here’s a combined series of sprints that integrate both the error handling improvements and the logging enhancements, structured into manageable tasks over multiple two-week Agile sprints.
try-except
blocks, logger.error
, logger.exception
, print
statements, and other logging or error-handling mechanisms.exc_info=True
in error logs, usage of generic exceptions, and places where logging is either missing or suboptimal.print
statements and prepare to replace them with appropriate logging statements.print
Statements and Improve Error Loggingprint
Statements with Loggingprint
statements with appropriate logging calls.Steps:
print
statements in the codebase.print
statements with logger.info
, logger.debug
, or other appropriate logging levels using f-strings.Example Before:
print(f"Processing data: {data}")
Example After:
logger.info(f"Processing data: {data}")
logger.error
statements include stack traces where applicable.Steps:
logger.error
statements to include exc_info=True
for stack traces.Example Before:
logger.error("An error occurred: %s", str(e))
Example After:
logger.error(f"An error occurred: {e}", exc_info=True)
print
statements replaced with appropriate logging statements.logger.error
statements with stack traces and f-strings.Steps:
logger.exception
.Example for Flask:
@app.errorhandler(Exception)
def handle_exception(e):
logger.exception(f"Unhandled exception: {e}")
return {"error": "An unexpected error occurred"}, 500
Steps:
DEBUG
, INFO
, WARNING
, ERROR
, CRITICAL
).info
statements that should be debug
and vice versa, based on their context.Example Before:
logger.info("Starting process...")
Example After:
logger.debug("Starting process...")
Steps:
InvalidUserInputError
, DatabaseConnectionError
).Exception
handling with specific custom exceptions.Example Before:
try:
# some code
except Exception as e:
logger.error(f"An error occurred: {e}")
Example After:
try:
# some code
except InvalidUserInputError as e:
logger.error(f"Invalid user input: {e}", exc_info=True)
Steps:
Example Before:
logger.info("Sending email")
Example After:
logger.info(f"Request {request_id}: Sending email to {user_email}")
Steps:
logger.exception
for logging errors with stack traces.Example:
@app.task(bind=True)
def my_task(self):
try:
# task logic
except SomeTransientError as e:
raise self.retry(exc=e, countdown=60, max_retries=3)
except Exception as e:
logger.exception(f"Task failed: {e}")
Steps:
Example:
start_time = time.time()
# perform task
logger.info(f"Task completed in {time.time() - start_time:.2f} seconds")
Steps:
Example:
def process_data(data):
if not isinstance(data, dict):
raise InvalidUserInputError(f"Expected a dictionary, got {type(data)}")
By the end of these sprints, the project will have robust and consistent logging and error handling practices, leveraging modern Python features like f-strings, custom exceptions, and centralized error management. This will result in a more maintainable, reliable, and performant codebase.
(There could be some parts that need tweaking for specific needs for our project, as we already have integration into logging systems, etc).
Here is the breakdown of the number of logger
statements for each level of logging in the project:
error
loggers that do not include stack traces + 57 exception
loggers that do include stack traces.There also are 131 print
statements in the code which would need to be retrofitted into being logger
statements at the appropriate levels.
Further, there are 22 lines that are info
, but might better be defined as debug
logging levels:
File: app/commands.py
current_app.logger.info(f"DATA = {data}")
File: app/aws/s3.py
current_app.logger.info(f"File downloaded successfully to {local_filename}")
File: app/celery/research_mode_tasks.py
current_app.logger.info("Mocked provider callback request finished")
File: app/celery/scheduled_tasks.py
current_app.logger.info("Job(s) {} have not completed.".format(job_ids))
File: app/celery/test_key_tasks.py
current_app.logger.info("Mocked provider callback request finished")
File: app/clients/cloudwatch/aws_cloudwatch.py
current_app.logger.info(f"START TIME {beginning} END TIME {now}")
File: app/service/rest.py
current_app.logger.info(f'SERVICE: {data["id"]}; {data}')
File: app/user/rest.py
current_app.logger.info("Sending email verification for user {}".format(user_id))
current_app.logger.info("Sending notification to queue")
current_app.logger.info("Sending notification to queue")
File: notifications_utils/clients/zendesk/zendesk_client.py
current_app.logger.info(f"Zendesk create ticket {ticket_id} succeeded")
File: app/commands.py
current_app.logger.info(f"DATA = {data}")
File: app/aws/s3.py
current_app.logger.info(f"File downloaded successfully to {local_filename}")
File: app/celery/research_mode_tasks.py
current_app.logger.info("Mocked provider callback request finished")
File: app/celery/scheduled_tasks.py
current_app.logger.info("Job(s) {} have not completed.".format(job_ids))
File: app/celery/test_key_tasks.py
current_app.logger.info("Mocked provider callback request finished")
File: app/clients/cloudwatch/aws_cloudwatch.py
current_app.logger.info(f"START TIME {beginning} END TIME {now}")
File: app/service/rest.py
current_app.logger.info(f'SERVICE: {data["id"]}; {data}')
File: app/user/rest.py
current_app.logger.info("Sending email verification for user {}".format(user_id))
current_app.logger.info("Sending notification to queue")
current_app.logger.info("Sending notification to queue")
File: notifications_utils/clients/zendesk/zendesk_client.py
current_app.logger.info(f"Zendesk create ticket {ticket_id} succeeded")
These statements were identified because they include keywords or patterns that typically align with debug
level logging.
And, there are 4 lines that are info
but might need to be revised to being a higher level like warning
or error
:
File: app/errors.py
current_app.logger.info(error)
File: app/errors.py
current_app.logger.info(error)
File: app/errors.py
current_app.logger.info(error)
File: app/errors.py
current_app.logger.info(error)
The log statements occur in the app/errors.py
file and should likely be elevated to a higher logging level, such as error
.
Something to note:
Modern versions of Python (3.11 and above) have had radically improved helpers in the stack traces and error messages to dramatically improve the ability to track down an error and fix it. Logging these will greatly improve our ability to pinpoint issues.
There are several parts of the application that would greatly benefit from better logging configuration as well as having more log statements added to provide clearer insight into how the application is behaving, regardless if things are going well or not.
Additionally, there are a few things we could do to help ourselves with debugging and application development by adding some additional tools (e.g., using a more robust debugger rather than just
pdb
), utility methods, and project/test configuration.NOTE: Story issues have not been created yet, this is very much still a WIP!
Improve error handling
notification_utils
code that swallow errors or raise exceptions in ways that are counter-intuitive and counter-productive; we need to adjust these to make them more usefulImprove debugging
hilite
method as another logging formatter as well; also see if it can be combined with or leverage anything from Werkzeug if that'd make it even more useful or robustpytest
piece to this too, in terms of how to debug tests