department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 9 forks source link

Upgrade Flask and address Latest Python Dependencies in the API Repo #724

Closed mjones-oddball closed 1 year ago

mjones-oddball commented 2 years ago

https://github.com/department-of-veterans-affairs/notification-api/pulls?q=is%3Aopen+is%3Apr+label%3Adependencies+label%3Apython

mjones-oddball commented 2 years ago

Hey team! Please add your planning poker estimate with ZenHub @ianperera @jakehova @jessecanderson @k-macmillan @kalbfled @maham360 @EvanParish

zlaw commented 2 years ago

Please add your planning poker estimate with ZenHub @Jake-Uhteg

Eallan919 commented 1 year ago

current flask version 1.1.4

vanotify-team/Engineering/dependency_management.md

Eallan919 commented 1 year ago

2.2.2 is the latest version

kalbfled commented 1 year ago

Upgrading major versions predictably results in ImportError and AttributeError exceptions that prevent running the app even once we get it to build. I upgraded some packages in requirements-app.txt, the top level requirements, to see if I can build and run the container, and I encountered Fido2 exceptions like this:

app_1         | Error: While importing 'application', an ImportError was raised:
app_1         | 
app_1         | Traceback (most recent call last):
app_1         |   File "/usr/local/lib/python3.8/site-packages/flask/cli.py", line 218, in locate_app
app_1         |     __import__(module_name)
app_1         |   File "/app/application.py", line 25, in <module>
app_1         |     create_app(application)
app_1         |   File "/app/app/__init__.py", line 193, in create_app
app_1         |     register_blueprint(application)
app_1         |   File "/app/app/__init__.py", line 210, in register_blueprint
app_1         |     from app.user.rest import user_blueprint
app_1         |   File "/app/app/user/rest.py", line 9, in <module>
app_1         |     from fido2.client import ClientData
app_1         | ImportError: cannot import name 'ClientData' from 'fido2.client' (/usr/local/lib/python3.8/site-packages/fido2/client.py)

FIDO 2.0 is a protocol for using hardware authentication devices (i.e. 2FA apps; RSA key fob), and the package's documentation assumes the reader is familiar with the details and intended use of the protocol. It shows code examples of how to do things rather than present the information in such a way that would make it easy to answer the question, "What has replaced ClientData in the newer version?"

This is a hurdle we need to clear to make progress on this ticket during the coming sprint. I think we will need a better understanding of how Notification API uses the protocol so we can update the code without changing the behavior. To that end, I have begun reading docs.

UPDATE: The message for this commit might provide an easy answer: Replace fido2.client.ClientData with fido2.webauthn.CollectedClientData.

cris-oddball commented 1 year ago

regarding Fido, this only appears to be used for the /user routes. Isn't the Strike team now responsible for handling user login, or are they using these routes?

kalbfled commented 1 year ago

I eliminated the Fido2 ImportError issues using that project's commit messages to determine where things moved. Here's the new hurdle when trying to run the app: ValueError: Couldn't import 'run_celery.notify_celery': The name 'v2_notifications' is already registered for this blueprint. Use 'name=' to provide a unique name..

Eallan919 commented 1 year ago

Thanks for looking into this @kalbfled As agreed in refinement, let's hold off on working this any further so we can all start fresh tomorrow. I'd like to hold a discussion around this as a team so we can row in the same direction.

Thank you!

mjones-oddball commented 1 year ago

@ianperera @EvanParish @kalbfled please update the ticket with any progress/blockers as you go. Evan, the suggestion was that maybe you start on ticket 870 for the celery work.

ianperera commented 1 year ago

In troubleshooting user flow, I found that POST {{notification-api-url}}/service/{{service-id}}/api-key is returning a 500 internal error, even though pytest has been passed. so the issue is not the test, but the API is broken. I'm currently working on debugging and fixing this.

kalbfled commented 1 year ago

Unit tests remain broken. I think I fixed last week's problem wherein the test database was not properly initialized. The problem was that newer version of Flask-SQLAlchemy require setting the SQLALCHEMY_DATABASE_URI before calling init_app.

I now get a different runtime traceback about "_transaction". I upgraded sqlalchemy to version 2.0.0, which was released last Thursday night, and I'm fixing an AttributeError introduced by the upgrade. I'm hoping the "_transaction" traceback goes away with the newest versions of the relevant packages. If that doesn't happen, I will post the traceback here and solicit help from the team.

kalbfled commented 1 year ago

Per our discussion in stand-up, I reverted SQLAlchemy back to version 1.4.46. I am using the newest versions of all the test dependencies except Flake8. This is the current traceback:

test_1  | _ ERROR at setup of test_process_pinpoint_results_notification_final_status[_SMS.FAILURE-UNREACHABLE-temporary-failure] _
test_1  | [gw2] linux -- Python 3.8.13 /usr/local/bin/python
test_1  | 
test_1  | request = <SubRequest '_transaction' for <Function test_process_pinpoint_results_notification_final_status[_SMS.FAILURE-UNREACHABLE-temporary-failure]>>
test_1  | _db = <SQLAlchemy postgresql://postgres:LocalPassword@db:5432/test_notification_api_gw2>
test_1  | mocker = <pytest_mock.plugin.MockerFixture object at 0x7f60708dd9a0>
test_1  | 
test_1  |     @pytest.fixture(scope='function')
test_1  |     def _transaction(request, _db, mocker):
test_1  |         '''
test_1  |         Create a transactional context for tests to run in.
test_1  |         '''
test_1  |         # Start a transaction
test_1  |         connection = _db.engine.connect()
test_1  |         transaction = connection.begin()
test_1  |     
test_1  |         # Bind a session to the transaction. The empty `binds` dict is necessary
test_1  |         # when specifying a `bind` option, or else Flask-SQLAlchemy won't scope
test_1  |         # the connection properly
test_1  |         options = dict(bind=connection, binds={})
test_1  | >       session = _db.create_scoped_session(options=options)
test_1  | 
test_1  | /usr/local/lib/python3.8/site-packages/pytest_flask_sqlalchemy/fixtures.py:38: 
test_1  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_1  | 
test_1  | self = <SQLAlchemy postgresql://postgres:LocalPassword@db:5432/test_notification_api_gw2>
test_1  | name = 'create_scoped_session'
test_1  | 
test_1  |     def __getattr__(self, name: str) -> t.Any:
test_1  |         if name == "db":
test_1  |             import warnings
test_1  |     
test_1  |             warnings.warn(
test_1  |                 "The 'db' attribute is deprecated and will be removed in"
test_1  |                 " Flask-SQLAlchemy 3.1. The extension is registered directly as"
test_1  |                 " 'app.extensions[\"sqlalchemy\"]'.",
test_1  |                 DeprecationWarning,
test_1  |                 stacklevel=2,
test_1  |             )
test_1  |             return self
test_1  |     
test_1  |         if name == "relation":
test_1  |             return self._relation
test_1  |     
test_1  |         if name == "event":
test_1  |             return sa.event
test_1  |     
test_1  |         if name.startswith("_"):
test_1  |             raise AttributeError(name)
test_1  |     
test_1  |         for mod in (sa, sa.orm):
test_1  |             if hasattr(mod, name):
test_1  |                 return getattr(mod, name)
test_1  |     
test_1  | >       raise AttributeError(name)
test_1  | E       AttributeError: create_scoped_session
test_1  | 
test_1  | /usr/local/lib/python3.8/site-packages/flask_sqlalchemy/extension.py:988: AttributeError

Note that the test database name now correctly includes the "_gwX" suffix.

ianperera commented 1 year ago

Per our discussion in stand-up

the create_api_key endpoint in User-flows has been fixed.

To summarize the issue and fix:

Issue: AttributeError: 'dict' object has no attribute.

Fix: create-api-key endpoint seen below.

def create_api_key(service_id=None):
    fetched_service = dao_fetch_service_by_id(service_id=service_id)
    valid_api_key = api_key_schema.load(request.get_json()).data
    valid_api_key.service = fetched_service
    save_model_api_key(valid_api_key)
    unsigned_api_key = get_unsigned_secret(valid_api_key.id)
    return jsonify(data=unsigned_api_key), 201
class ApiKeySchema(BaseSchema):
    created_by = field_for(models.ApiKey, 'created_by', required=True)
    key_type = field_for(models.ApiKey, 'key_type', required=True)

    class Meta:
        model = models.ApiKey
        exclude = ("service", "_secret")
        strict = True

as you can see service is not a field of ApiKeySchema, and because we updated the marshmallow-sqlalchemy==0.22.2 we are now faced with the schema error.

URL for context reading.

So to solved this issue we add load_instance = true in Meta to deserializes the schema into the normal dictionary object rather than Schema object.

class ApiKeySchema(BaseSchema):
    created_by = field_for(models.ApiKey, 'created_by', required=True)
    key_type = field_for(models.ApiKey, 'key_type', required=True)

    class Meta:
        model = models.ApiKey
        exclude = ("service", "_secret")
        strict = True
        load_instance = True
mjones-oddball commented 1 year ago

Suggestion to look at the Canadian and/or UK repo to compare changes they've made. Ian P is going to keep helping with this.

Kyle offered to assist as well.

k-macmillan commented 1 year ago

@ianperera came through with a fantastic catch in the documentation. With that change @kalbfled and I wrestled through a lot of failing tests and ended up going from <100 passing tests to over 1100 passing tests. image.png

The current issue is that worker 4 is expecting an imported database (based on our migrations) but one of the tests deletes all the records for provider_details and fills it with new providers so it can test priority stuff. For some reason it is not fixed in the teardown. This may have been covered by pytest-flask-sqlalchemy but we had to remove that because it was causing other issues. Branch has been pushed so others can pull the changes.

kalbfled commented 1 year ago

I pushed changes that get us to 4 failed, 1168 passed, 21 skipped, 6 errors.

Eallan919 commented 1 year ago

Em to add skipped items to 'ish list. Best practice: add a comment in the marker- so we can go back and reconcile

kalbfled commented 1 year ago

I pushed changes that get us to 4 failed, 1306 passed, 33 skipped, 6 errors. To get this result, I had to delete my local database container before running the full test suite. Otherwise, only 1299 tests pass because of the known issue that one of the tests is trashing the database. This is causing the plethora of No row was found when one was required errors. I think we need to prioritize fixing that mode of failure.

mjones-oddball commented 1 year ago

It's hard to know for sure how long it will take, but we are thinking possibly another week (5 pts)

mjones-oddball commented 1 year ago

Increasing from 13 to 21 to account for multiple people and extra time spent

Eallan919 commented 1 year ago

From David 176 failed, 3186 passed, 147 skipped, 3 xfailed

Kyle: ALMOST there!

tabinda-syed commented 1 year ago

160 still failing; 20 fewer than before. Dave will follow up with Ian/Evan.

kalbfled commented 1 year ago

With limit of 10 failures: 10 failed, 1726 passed, 95 skipped, 2 xfailed.

Running all tests: 146 failed, 3217 passed, 147 skipped, 3 xfailed.

Eallan919 commented 1 year ago

Question on this- with some of the tests that are failing- are they important ones?

Also, hoping we can ask @ianperera if he can provide his eagle eyes! Have a think :)

ianperera commented 1 year ago

create a PR for Utils updating pypdf3, as pypdf versions is not matching between utils and api repo

ianperera commented 1 year ago

the above PR is ready, but I'll set it aside because @k-macmillan, mention I would need to test all the api functionality. I will leave for last.

Currently, 142 failed, 3220 passed, 147 skipped, 3 failed.

tests/app/template/test_rest.py involved about 16 test failed. I am trying to fixing this pakage of test cases.

kalbfled commented 1 year ago

91 failed, 3271 passed, 147 skipped, 3 xfailed, 1 error

tabinda-syed commented 1 year ago

Dave, Baldwin, and Ian working on various files to see how we can get the remaining unit tests working. Dave predicts good chance at wrapping it up.

Baldwin believes the ones he's working on may be related to the Marshmallow.

ianperera commented 1 year ago

85 failed, 3278 passed, 147 skipped, 3 xfailed

kalbfled commented 1 year ago

The team decided to mark as xfail tests that are failing for routes that are not actually being used. I annotated tests/app/user/test_rest.py and pushed changes. The current status is 70 failed, 3285 passed, 147 skipped, 11 xfailed.

We still need to merge notification-utils 112 and update notification-api to use the latest master commit. That will eliminate some more failures.

Eallan919 commented 1 year ago

Baldwin to pick up new file!!!

tabinda-syed commented 1 year ago

As few as 60 - 75 unit tests failing now, but repeatability issues.

Dave: Once the tests are passing, we should consider refactoring how the tests are running. [Future goal + ticket] Team to discuss at engineering huddle to consider decomposition/iterative approach. Cris: Test database that is seeded would be very helpful. Jacob: See if they refactored those tests in the Canadian repo.

*Kyle: Need new ticket to make sure we have a group call to ensure tests/API params were not changed. [ @tabinda-syed ]

FYI/reminder, @mjones-oddball @kalbfled @k-macmillan

ianperera commented 1 year ago

pushed an update

ianperera commented 1 year ago

9 failed, 3321 passed, 147 skipped, 35 xfailed

ianperera commented 1 year ago

@kalbfled please see URL

ianperera commented 1 year ago

0.25.0 (2021-05-02) CHANGELOG Add load_instance as a parameter to SQLAlchemySchema and SQLAlchemyAutoSchema (:pr:380). Thanks :user:mjpieters for the PR

kalbfled commented 1 year ago

1 failed, 3329 passed, 147 skipped, 35 xfailed

I didn't re-run this to ensure that it is repeatable, but I wouldn't be surprised if that's not the case. The remaining test failure is tests/app/v2/test_errors.py::test_bad_method

kalbfled commented 1 year ago

All unit tests passed. I immediately re-run the full suite without deleting or altering my database container, and all tests passed again.

ianperera commented 1 year ago

@kalbfled great work man!

tabinda-syed commented 1 year ago

@kalbfled @ianperera @EvanParish @cris-oddball @k-macmillan @justaskdavidb2 Way to go, team!!! Congratulations on a huuuuge win!!

cris-oddball commented 1 year ago

At present, there are 4 issues related to this upgrade in Perf:

  1. Missing properties on the templates route GET /service/:service-id/template GET /service/:service-id/template/:template-id the JSON responses from the routes no longer include these properties: 'folder', 'service' , 'service_letter_contact', and 'template_redacted' These properties used to exist in Perf and still do in Staging and Prod.

  2. SAWarning: relationship 'ScheduledNotification.notification' Evan believes that we can clean this up later - should we create a ticket to clean up models.py?

  3. SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly

  4. "Celery task: generate-nightly-billing-csv-report failed" Probably the biggest issue and there was no data in BQ for this last night (post-deploy to perf on 2/28)

Need to discuss these with engineering to see what we will do about each of these.

cc/ @k-macmillan @kalbfled @mjones-oddball @tabinda-syed @EvanParish

cris-oddball commented 1 year ago
  1. /v2/notifications/push - this may not be enabled in Perf. Response from the VANotify service in Perf via Postman with a known good body:
    {
    "errors": [
        {
            "error": "NotImplementedError",
            "message": "Not implemented"
        }
    ],
    "status_code": 501
    }
cris-oddball commented 1 year ago
  1. Whitelist

EDITED: Evan helped me to test this, both routes pass.

cris-oddball commented 1 year ago
  1. Another mystery route: POST /provider-details/:provider-detail-id to update a provider.

EDITED: Determined a good property to change in the Update route. Passes. Unable to test DEL since there is no POST route to create that Dave could find.

cris-oddball commented 1 year ago
  1. POST /service/:service-id/api-keys/revoke/:api-key-id To revoke an API key. On Perf, the response is a 202 Accepted but also null is returned On Staging, the response is a 202 Accepted and an empty object is returned {}
cris-oddball commented 1 year ago
  1. POST /service/:service-id/sms-sender/:sms-sender-id/archive
    • unclear what the payload is supposed to be (not in swagger)
    • unable to test until I understand the route

EDITED: the route works, passed an empty payload.

cris-oddball commented 1 year ago
  1. POST /service/:service-id/sms-sender/:sms-sender-id To update an SMS sender ID. The route works and returns a response indicating that the sms sender id was updated to have an updated inbound_number_id. However the GET response never changed, it is still "inbound_number_id": null

Have not tested in staging yet to see if the behavior is the same.

cris-oddball commented 1 year ago

QA SUMMARY

PASSING ROUTES (work as intended)

GET /_status POST v2/notifications/sms POST v2/notifications/email GET v2/notifications/:notification-id GET v2/notifications/ GET /service GET /service/:service-id GET /service/find-services-by-name?service_name=:service-name GET /service/:service-id/job GET /service/sms-senders POST /service/:service-id/sms-sender POST service/:service-id//callback POST service/:service-id//callback/:callback-id DEL service/:service-id//callback/:callback-id - returns 204 No Content and deletes the callback GET service/:service-id/callback GET service/:service-id/callback/:callback-id GET /service/:service-id/api-keys POST /service/:service-id/api-keys GET /inbound-number GET /inbound-number/available GET /inbound-number/service/:service-id POST /inbound-number POST /inbound-number/:inbound_number_id GET /organisations GET /provider-details GET /provider-details::provider-detail-id POST /provider-details/:provider-detail-id - this is just for update, could only update "active" field. GET /communication-item PUT /service/:service-id/whitelist - returns 204 no content GET /service/:service-id/whitelist POST /service/:service-id/sms-sender/:sms-sender-id/archive

FAILING ROUTES POST /service/:service-id/api-keys/revoke/:api-key-id passes - 202 Accepted reponse with null message response on Staging is an empty object {}

POST /service/:service-id/sms-sender/:sms-sender-id, updated inbound_number_id, however the GET response never changed, it is still "inbound_number_id": null failure

GET /service/:service-id/template JSON schema fails, does not include the properties 'service' or 'template_redacted' 'folder' 'service' 'service_letter_contact' 'template_redacted'

GET /service/:service-id/template/:template-id JSON schema fails, does not include the properties 'folder' 'service' 'service_letter_contact' 'template_redacted'

UNABLE TO TEST

DEL /provider-details/:provider-detail-id - cannot test until we know how to create one

POST v2/notifications/push - do not know how to test - unable to test in Perf { "errors": [ { "error": "NotImplementedError", "message": "Not implemented" } ], "status_code": 501 }

tabinda-syed commented 1 year ago

Please see the Flask Upgrade QA Results document for the list of issues remaining at this time.

kalbfled commented 1 year ago

I successfully deployed changes for this ticket to development. When we have verified that we get the correct overnight billing stats, the PR will be ready for review.