GSA / notifications-api

The API powering Notify.gov
Other
10 stars 1 forks source link

Improve Error Handling and Debugging Posture #1065

Open ccostino opened 4 months ago

ccostino commented 4 months ago

There are several parts of the application that would greatly benefit from better logging configuration as well as having more log statements added to provide clearer insight into how the application is behaving, regardless if things are going well or not.

Additionally, there are a few things we could do to help ourselves with debugging and application development by adding some additional tools (e.g., using a more robust debugger rather than just pdb), utility methods, and project/test configuration.

NOTE: Story issues have not been created yet, this is very much still a WIP!

Improve error handling

Improve debugging

xlorepdarkhelm commented 2 months ago

Here is some thoughts for improving logging & error handling

Certainly! Here’s a combined series of sprints that integrate both the error handling improvements and the logging enhancements, structured into manageable tasks over multiple two-week Agile sprints.

Sprint 1: Audit, Documentation, and Standardization

Task 1: Comprehensive Audit

Task 2: Documentation of Best Practices

Sprint 1 Deliverables:

Sprint 2: Replace print Statements and Improve Error Logging

Task 1: Replace print Statements with Logging

Task 2: Enhance Error Logging with Stack Traces

Sprint 2 Deliverables:

Sprint 3: Centralized Error Handling and Logging Standardization

Task 1: Implement Centralized Error Handling

Task 2: Standardize Logging Levels

Sprint 3 Deliverables:

Sprint 4: Custom Exceptions and Contextual Logging

Task 1: Introduce Custom Exceptions

Task 2: Implement Contextual Logging

Sprint 4 Deliverables:

Sprint 5: Improve Background Task Error Handling and Performance Logging

Task 1: Enhance Error Handling in Background Tasks

Task 2: Implement Performance Logging

Sprint 5 Deliverables:

Sprint 6: Final Validation and Cleanup

Task 1: Validate and Improve Input/Output Handling

Task 2: Clean Up Unused Code and Final Refactoring

Sprint 6 Deliverables:

Overall Outcome:

By the end of these sprints, the project will have robust and consistent logging and error handling practices, leveraging modern Python features like f-strings, custom exceptions, and centralized error management. This will result in a more maintainable, reliable, and performant codebase.

(There could be some parts that need tweaking for specific needs for our project, as we already have integration into logging systems, etc).

xlorepdarkhelm commented 2 months ago

Here is the breakdown of the number of logger statements for each level of logging in the project:

There also are 131 print statements in the code which would need to be retrofitted into being logger statements at the appropriate levels.

xlorepdarkhelm commented 2 months ago

Further, there are 22 lines that are info, but might better be defined as debug logging levels:

  1. File: app/commands.py

    • Line 285: current_app.logger.info(f"DATA = {data}")
  2. File: app/aws/s3.py

    • Line 111: current_app.logger.info(f"File downloaded successfully to {local_filename}")
  3. File: app/celery/research_mode_tasks.py

    • Line 54: current_app.logger.info("Mocked provider callback request finished")
  4. File: app/celery/scheduled_tasks.py

    • Line 133: current_app.logger.info("Job(s) {} have not completed.".format(job_ids))
  5. File: app/celery/test_key_tasks.py

    • Line 54: current_app.logger.info("Mocked provider callback request finished")
  6. File: app/clients/cloudwatch/aws_cloudwatch.py

    • Line 58: current_app.logger.info(f"START TIME {beginning} END TIME {now}")
  7. File: app/service/rest.py

    • Line 205: current_app.logger.info(f'SERVICE: {data["id"]}; {data}')
  8. File: app/user/rest.py

    • Line 429: current_app.logger.info("Sending email verification for user {}".format(user_id))
    • Line 472: current_app.logger.info("Sending notification to queue")
    • Line 514: current_app.logger.info("Sending notification to queue")
  9. File: notifications_utils/clients/zendesk/zendesk_client.py

    • Line 37: current_app.logger.info(f"Zendesk create ticket {ticket_id} succeeded")
  10. File: app/commands.py

    • Line 285: current_app.logger.info(f"DATA = {data}")
  11. File: app/aws/s3.py

    • Line 111: current_app.logger.info(f"File downloaded successfully to {local_filename}")
  12. File: app/celery/research_mode_tasks.py

    • Line 54: current_app.logger.info("Mocked provider callback request finished")
  13. File: app/celery/scheduled_tasks.py

    • Line 133: current_app.logger.info("Job(s) {} have not completed.".format(job_ids))
  14. File: app/celery/test_key_tasks.py

    • Line 54: current_app.logger.info("Mocked provider callback request finished")
  15. File: app/clients/cloudwatch/aws_cloudwatch.py

    • Line 58: current_app.logger.info(f"START TIME {beginning} END TIME {now}")
  16. File: app/service/rest.py

    • Line 205: current_app.logger.info(f'SERVICE: {data["id"]}; {data}')
  17. File: app/user/rest.py

    • Line 429: current_app.logger.info("Sending email verification for user {}".format(user_id))
    • Line 472: current_app.logger.info("Sending notification to queue")
    • Line 514: current_app.logger.info("Sending notification to queue")
  18. File: notifications_utils/clients/zendesk/zendesk_client.py

    • Line 37: current_app.logger.info(f"Zendesk create ticket {ticket_id} succeeded")

These statements were identified because they include keywords or patterns that typically align with debug level logging.

And, there are 4 lines that are info but might need to be revised to being a higher level like warning or error:

  1. File: app/errors.py

    • Line 52: current_app.logger.info(error)
  2. File: app/errors.py

    • Line 57: current_app.logger.info(error)
  3. File: app/errors.py

    • Line 62: current_app.logger.info(error)
  4. File: app/errors.py

    • Line 69: current_app.logger.info(error)

The log statements occur in the app/errors.py file and should likely be elevated to a higher logging level, such as error.

xlorepdarkhelm commented 2 months ago

Something to note:

Modern versions of Python (3.11 and above) have had radically improved helpers in the stack traces and error messages to dramatically improve the ability to track down an error and fix it. Logging these will greatly improve our ability to pinpoint issues.