Closed mjones-oddball closed 1 year ago
Hey team! Please add your planning poker estimate with ZenHub @ianperera @jakehova @jessecanderson @k-macmillan @kalbfled @maham360 @EvanParish
Please add your planning poker estimate with ZenHub @Jake-Uhteg
current flask version 1.1.4
vanotify-team/Engineering/dependency_management.md
2.2.2 is the latest version
Upgrading major versions predictably results in ImportError and AttributeError exceptions that prevent running the app even once we get it to build. I upgraded some packages in requirements-app.txt, the top level requirements, to see if I can build and run the container, and I encountered Fido2 exceptions like this:
app_1 | Error: While importing 'application', an ImportError was raised:
app_1 |
app_1 | Traceback (most recent call last):
app_1 | File "/usr/local/lib/python3.8/site-packages/flask/cli.py", line 218, in locate_app
app_1 | __import__(module_name)
app_1 | File "/app/application.py", line 25, in <module>
app_1 | create_app(application)
app_1 | File "/app/app/__init__.py", line 193, in create_app
app_1 | register_blueprint(application)
app_1 | File "/app/app/__init__.py", line 210, in register_blueprint
app_1 | from app.user.rest import user_blueprint
app_1 | File "/app/app/user/rest.py", line 9, in <module>
app_1 | from fido2.client import ClientData
app_1 | ImportError: cannot import name 'ClientData' from 'fido2.client' (/usr/local/lib/python3.8/site-packages/fido2/client.py)
FIDO 2.0 is a protocol for using hardware authentication devices (i.e. 2FA apps; RSA key fob), and the package's documentation assumes the reader is familiar with the details and intended use of the protocol. It shows code examples of how to do things rather than present the information in such a way that would make it easy to answer the question, "What has replaced ClientData in the newer version?"
This is a hurdle we need to clear to make progress on this ticket during the coming sprint. I think we will need a better understanding of how Notification API uses the protocol so we can update the code without changing the behavior. To that end, I have begun reading docs.
UPDATE: The message for this commit might provide an easy answer: Replace fido2.client.ClientData with fido2.webauthn.CollectedClientData.
regarding Fido, this only appears to be used for the /user
routes. Isn't the Strike team now responsible for handling user login, or are they using these routes?
I eliminated the Fido2 ImportError issues using that project's commit messages to determine where things moved. Here's the new hurdle when trying to run the app: ValueError: Couldn't import 'run_celery.notify_celery': The name 'v2_notifications' is already registered for this blueprint. Use 'name=' to provide a unique name.
.
Thanks for looking into this @kalbfled As agreed in refinement, let's hold off on working this any further so we can all start fresh tomorrow. I'd like to hold a discussion around this as a team so we can row in the same direction.
Thank you!
@ianperera @EvanParish @kalbfled please update the ticket with any progress/blockers as you go. Evan, the suggestion was that maybe you start on ticket 870 for the celery work.
In troubleshooting user flow, I found that POST {{notification-api-url}}/service/{{service-id}}/api-key
is returning a 500 internal error, even though pytest has been passed. so the issue is not the test, but the API is broken. I'm currently working on debugging and fixing this.
Unit tests remain broken. I think I fixed last week's problem wherein the test database was not properly initialized. The problem was that newer version of Flask-SQLAlchemy require setting the SQLALCHEMY_DATABASE_URI before calling init_app.
I now get a different runtime traceback about "_transaction". I upgraded sqlalchemy to version 2.0.0, which was released last Thursday night, and I'm fixing an AttributeError introduced by the upgrade. I'm hoping the "_transaction" traceback goes away with the newest versions of the relevant packages. If that doesn't happen, I will post the traceback here and solicit help from the team.
Per our discussion in stand-up, I reverted SQLAlchemy back to version 1.4.46. I am using the newest versions of all the test dependencies except Flake8. This is the current traceback:
test_1 | _ ERROR at setup of test_process_pinpoint_results_notification_final_status[_SMS.FAILURE-UNREACHABLE-temporary-failure] _
test_1 | [gw2] linux -- Python 3.8.13 /usr/local/bin/python
test_1 |
test_1 | request = <SubRequest '_transaction' for <Function test_process_pinpoint_results_notification_final_status[_SMS.FAILURE-UNREACHABLE-temporary-failure]>>
test_1 | _db = <SQLAlchemy postgresql://postgres:LocalPassword@db:5432/test_notification_api_gw2>
test_1 | mocker = <pytest_mock.plugin.MockerFixture object at 0x7f60708dd9a0>
test_1 |
test_1 | @pytest.fixture(scope='function')
test_1 | def _transaction(request, _db, mocker):
test_1 | '''
test_1 | Create a transactional context for tests to run in.
test_1 | '''
test_1 | # Start a transaction
test_1 | connection = _db.engine.connect()
test_1 | transaction = connection.begin()
test_1 |
test_1 | # Bind a session to the transaction. The empty `binds` dict is necessary
test_1 | # when specifying a `bind` option, or else Flask-SQLAlchemy won't scope
test_1 | # the connection properly
test_1 | options = dict(bind=connection, binds={})
test_1 | > session = _db.create_scoped_session(options=options)
test_1 |
test_1 | /usr/local/lib/python3.8/site-packages/pytest_flask_sqlalchemy/fixtures.py:38:
test_1 | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_1 |
test_1 | self = <SQLAlchemy postgresql://postgres:LocalPassword@db:5432/test_notification_api_gw2>
test_1 | name = 'create_scoped_session'
test_1 |
test_1 | def __getattr__(self, name: str) -> t.Any:
test_1 | if name == "db":
test_1 | import warnings
test_1 |
test_1 | warnings.warn(
test_1 | "The 'db' attribute is deprecated and will be removed in"
test_1 | " Flask-SQLAlchemy 3.1. The extension is registered directly as"
test_1 | " 'app.extensions[\"sqlalchemy\"]'.",
test_1 | DeprecationWarning,
test_1 | stacklevel=2,
test_1 | )
test_1 | return self
test_1 |
test_1 | if name == "relation":
test_1 | return self._relation
test_1 |
test_1 | if name == "event":
test_1 | return sa.event
test_1 |
test_1 | if name.startswith("_"):
test_1 | raise AttributeError(name)
test_1 |
test_1 | for mod in (sa, sa.orm):
test_1 | if hasattr(mod, name):
test_1 | return getattr(mod, name)
test_1 |
test_1 | > raise AttributeError(name)
test_1 | E AttributeError: create_scoped_session
test_1 |
test_1 | /usr/local/lib/python3.8/site-packages/flask_sqlalchemy/extension.py:988: AttributeError
Note that the test database name now correctly includes the "_gwX" suffix.
Per our discussion in stand-up
the create_api_key endpoint in User-flows has been fixed.
To summarize the issue and fix:
Issue: AttributeError: 'dict' object has no attribute.
Fix:
create-api-key
endpoint seen below.
def create_api_key(service_id=None):
fetched_service = dao_fetch_service_by_id(service_id=service_id)
valid_api_key = api_key_schema.load(request.get_json()).data
valid_api_key.service = fetched_service
save_model_api_key(valid_api_key)
unsigned_api_key = get_unsigned_secret(valid_api_key.id)
return jsonify(data=unsigned_api_key), 201
class ApiKeySchema(BaseSchema):
created_by = field_for(models.ApiKey, 'created_by', required=True)
key_type = field_for(models.ApiKey, 'key_type', required=True)
class Meta:
model = models.ApiKey
exclude = ("service", "_secret")
strict = True
as you can see service
is not a field of ApiKeySchema, and because we updated the marshmallow-sqlalchemy==0.22.2
we are now faced with the schema error.
So to solved this issue we add load_instance = true
in Meta to deserializes the schema into the normal dictionary object rather than Schema object.
class ApiKeySchema(BaseSchema):
created_by = field_for(models.ApiKey, 'created_by', required=True)
key_type = field_for(models.ApiKey, 'key_type', required=True)
class Meta:
model = models.ApiKey
exclude = ("service", "_secret")
strict = True
load_instance = True
Suggestion to look at the Canadian and/or UK repo to compare changes they've made. Ian P is going to keep helping with this.
Kyle offered to assist as well.
@ianperera came through with a fantastic catch in the documentation. With that change @kalbfled and I wrestled through a lot of failing tests and ended up going from <100 passing tests to over 1100 passing tests.
The current issue is that worker 4 is expecting an imported database (based on our migrations) but one of the tests deletes all the records for provider_details and fills it with new providers so it can test priority stuff. For some reason it is not fixed in the teardown. This may have been covered by pytest-flask-sqlalchemy but we had to remove that because it was causing other issues. Branch has been pushed so others can pull the changes.
I pushed changes that get us to 4 failed, 1168 passed, 21 skipped, 6 errors
.
Em to add skipped items to 'ish list. Best practice: add a comment in the marker- so we can go back and reconcile
I pushed changes that get us to 4 failed, 1306 passed, 33 skipped, 6 errors
. To get this result, I had to delete my local database container before running the full test suite. Otherwise, only 1299 tests pass because of the known issue that one of the tests is trashing the database. This is causing the plethora of No row was found when one was required
errors. I think we need to prioritize fixing that mode of failure.
It's hard to know for sure how long it will take, but we are thinking possibly another week (5 pts)
Increasing from 13 to 21 to account for multiple people and extra time spent
From David 176 failed, 3186 passed, 147 skipped, 3 xfailed
Kyle: ALMOST there!
160 still failing; 20 fewer than before. Dave will follow up with Ian/Evan.
With limit of 10 failures: 10 failed, 1726 passed, 95 skipped, 2 xfailed
.
Running all tests: 146 failed, 3217 passed, 147 skipped, 3 xfailed
.
Question on this- with some of the tests that are failing- are they important ones?
Also, hoping we can ask @ianperera if he can provide his eagle eyes! Have a think :)
create a PR for Utils updating pypdf3, as pypdf versions is not matching between utils and api repo
the above PR is ready, but I'll set it aside because @k-macmillan, mention I would need to test all the api functionality. I will leave for last.
Currently, 142 failed, 3220 passed, 147 skipped, 3 failed.
tests/app/template/test_rest.py
involved about 16 test failed. I am trying to fixing this pakage of test cases.
91 failed, 3271 passed, 147 skipped, 3 xfailed, 1 error
Dave, Baldwin, and Ian working on various files to see how we can get the remaining unit tests working. Dave predicts good chance at wrapping it up.
Baldwin believes the ones he's working on may be related to the Marshmallow.
85 failed, 3278 passed, 147 skipped, 3 xfailed
The team decided to mark as xfail tests that are failing for routes that are not actually being used. I annotated tests/app/user/test_rest.py and pushed changes. The current status is 70 failed, 3285 passed, 147 skipped, 11 xfailed
.
We still need to merge notification-utils 112 and update notification-api to use the latest master commit. That will eliminate some more failures.
Baldwin to pick up new file!!!
As few as 60 - 75 unit tests failing now, but repeatability issues.
Dave: Once the tests are passing, we should consider refactoring how the tests are running. [Future goal + ticket] Team to discuss at engineering huddle to consider decomposition/iterative approach. Cris: Test database that is seeded would be very helpful. Jacob: See if they refactored those tests in the Canadian repo.
*Kyle: Need new ticket to make sure we have a group call to ensure tests/API params were not changed. [ @tabinda-syed ]
FYI/reminder, @mjones-oddball @kalbfled @k-macmillan
pushed an update
9 failed, 3321 passed, 147 skipped, 35 xfailed
0.25.0 (2021-05-02) CHANGELOG Add load_instance as a parameter to SQLAlchemySchema and SQLAlchemyAutoSchema (:pr:380). Thanks :user:mjpieters for the PR
1 failed, 3329 passed, 147 skipped, 35 xfailed
I didn't re-run this to ensure that it is repeatable, but I wouldn't be surprised if that's not the case. The remaining test failure is tests/app/v2/test_errors.py::test_bad_method
All unit tests passed. I immediately re-run the full suite without deleting or altering my database container, and all tests passed again.
@kalbfled great work man!
@kalbfled @ianperera @EvanParish @cris-oddball @k-macmillan @justaskdavidb2 Way to go, team!!! Congratulations on a huuuuge win!!
At present, there are 4 issues related to this upgrade in Perf:
Missing properties on the templates route GET /service/:service-id/template GET /service/:service-id/template/:template-id the JSON responses from the routes no longer include these properties: 'folder', 'service' , 'service_letter_contact', and 'template_redacted' These properties used to exist in Perf and still do in Staging and Prod.
SAWarning: relationship 'ScheduledNotification.notification' Evan believes that we can clean this up later - should we create a ticket to clean up models.py?
SAWarning: Coercing Subquery object into a select() for use in IN(); please pass a select() construct explicitly
"Celery task: generate-nightly-billing-csv-report failed" Probably the biggest issue and there was no data in BQ for this last night (post-deploy to perf on 2/28)
Need to discuss these with engineering to see what we will do about each of these.
cc/ @k-macmillan @kalbfled @mjones-oddball @tabinda-syed @EvanParish
{
"errors": [
{
"error": "NotImplementedError",
"message": "Not implemented"
}
],
"status_code": 501
}
EDITED: Evan helped me to test this, both routes pass.
EDITED: Determined a good property to change in the Update route. Passes. Unable to test DEL since there is no POST route to create that Dave could find.
null
is returned
On Staging, the response is a 202 Accepted and an empty object is returned {}
EDITED: the route works, passed an empty payload.
Have not tested in staging yet to see if the behavior is the same.
QA SUMMARY
PASSING ROUTES (work as intended)
GET /_status POST v2/notifications/sms POST v2/notifications/email GET v2/notifications/:notification-id GET v2/notifications/ GET /service GET /service/:service-id GET /service/find-services-by-name?service_name=:service-name GET /service/:service-id/job GET /service/sms-senders POST /service/:service-id/sms-sender POST service/:service-id//callback POST service/:service-id//callback/:callback-id DEL service/:service-id//callback/:callback-id - returns 204 No Content and deletes the callback GET service/:service-id/callback GET service/:service-id/callback/:callback-id GET /service/:service-id/api-keys POST /service/:service-id/api-keys GET /inbound-number GET /inbound-number/available GET /inbound-number/service/:service-id POST /inbound-number POST /inbound-number/:inbound_number_id GET /organisations GET /provider-details GET /provider-details::provider-detail-id POST /provider-details/:provider-detail-id - this is just for update, could only update "active" field. GET /communication-item PUT /service/:service-id/whitelist - returns 204 no content GET /service/:service-id/whitelist POST /service/:service-id/sms-sender/:sms-sender-id/archive
FAILING ROUTES POST /service/:service-id/api-keys/revoke/:api-key-id passes - 202 Accepted reponse with null message response on Staging is an empty object {}
POST /service/:service-id/sms-sender/:sms-sender-id, updated inbound_number_id, however the GET response never changed, it is still "inbound_number_id": null failure
GET /service/:service-id/template JSON schema fails, does not include the properties 'service' or 'template_redacted' 'folder' 'service' 'service_letter_contact' 'template_redacted'
GET /service/:service-id/template/:template-id JSON schema fails, does not include the properties 'folder' 'service' 'service_letter_contact' 'template_redacted'
UNABLE TO TEST
DEL /provider-details/:provider-detail-id - cannot test until we know how to create one
POST v2/notifications/push - do not know how to test - unable to test in Perf { "errors": [ { "error": "NotImplementedError", "message": "Not implemented" } ], "status_code": 501 }
Please see the Flask Upgrade QA Results document for the list of issues remaining at this time.
I successfully deployed changes for this ticket to development. When we have verified that we get the correct overnight billing stats, the PR will be ready for review.
https://github.com/department-of-veterans-affairs/notification-api/pulls?q=is%3Aopen+is%3Apr+label%3Adependencies+label%3Apython