GSA / notifications-api

The API powering Notify.gov
Other
10 stars 2 forks source link

Exception Investigation: sqlalchemy.exc:NoResultFound #1033

Closed ccostino closed 5 months ago

ccostino commented 5 months ago

This is one of the errors we've seen captured in New Relic that we'd like to dig into and understand, if not also resolve.

This happens during a one-off message send. The message will still be sent out, but this error is thrown in the background. It is currently affecting all environments and can be reproduced locally.

In staging, one-off message sends seem to take an extra mount of time before they're actually sent. It's not clear if this issue is related or not.

Error message: No row was found when one was required Path: /service//job/ Exception: sqlalchemy.exc:NoResultFound

Traceback (most recent call last):
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/eventlet/greenthread.py", line 265, in main
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/gunicorn/workers/geventlet.py", line 157, in handle
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/gunicorn/workers/base_async.py", line 55, in handle
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/gunicorn/workers/base_async.py", line 108, in handle_request
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/newrelic/api/wsgi_application.py", line 669, in _nr_wsgi_application_wrapper_
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/flask/app.py", line 1498, in __call__
File "/home/vcap/app/notifications_utils/request_helper.py", line 80, in __call__
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/newrelic/api/wsgi_application.py", line 564, in _nr_wsgi_application_wrapper_
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/flask/app.py", line 1473, in wsgi_app
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/newrelic/hooks/framework_flask.py", line 79, in _nr_wrapper_handler_
File "/home/vcap/app/app/job/rest.py", line 46, in get_job_by_service_and_job_id
File "/home/vcap/app/app/dao/jobs_dao.py", line 45, in dao_get_job_by_service_id_and_job_id
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2778, in one
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 1810, in one
File "/home/vcap/deps/0/python/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 752, in _only_one_row

Implementation Sketch and Acceptance Criteria

Security Considerations

terrazoon commented 5 months ago
  1. In order to remove PII from the database, we had to convert one-off sends into jobs
  2. In order for one-off sends to become jobs, we had to create csv files for them and upload them to S3
  3. Since they are jobs, they utilize the same start job/create job mechanism as all other jobs -- a mechanism that was not designed with one-off sends in mind.

So currently there is a timing issue where the UI has to poll and wait for the job to be created in the db before it can fetch the info it needs. There may be multiple calls with 'no result found' but the code repeats a number of times, so ultimately it succeeds.

Eventually this issue should get resolved. @A-Shumway42 is reworking the timing so that 'create_job' occurs long before 'start_job'. When that work is completely, we should not get spurious 'No Result Found' errors.

ccostino commented 5 months ago

Thanks, @terrazoon! This makes sense and if @A-Shumway42's work adjusts the order of timing so that the method calls happen when they need to, all the better. 🙂

Thanks for taking a look into this and providing the explanation! We're going to close this out.