ASFHyP3 / hyp3

A processing environment for HyP3 Plugins in AWS.
BSD 3-Clause "New" or "Revised" License
39 stars 8 forks source link

Upgrade openapi packages #1193

Closed asjohnston-asf closed 10 months ago

asjohnston-asf commented 2 years ago

Jira: https://asfdaac.atlassian.net/browse/TOOL-2046

After upgrading to openapi-spec-validator to v0.5.0, make install fails with:

The conflict is caused by:
    The user requested openapi-spec-validator==0.5.1
    openapi-core 0.14.5 depends on openapi-spec-validator<0.5.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Pip subprocess error:
ERROR: Cannot install -r /home/asjohnston/src/hyp3/requirements-apps-api.txt (line 4) and openapi-spec-validator==0.5.1 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

failed

CondaEnvException: Pip failed

requirements-api.txt requires openapi-core 0.14.5, which depends on openapi-spec-validator<0.5.0. See https://github.com/p1c2u/openapi-core/issues/402 for more details.

We separately specify openapi-spec-validator as a requirement in requirements-all.txt, since we use it for static analysis independent of deploying the API. We can't upgrade the version specified there past 0.4.0 because of this conflict.

asjohnston-asf commented 1 year ago

The conflict between openapi-spec-validator and openapi-core has been resolved in the current versions, but upgrading both leads to another dependency conflict for jsonschema:

The conflict is caused by:
    cfn-lint 0.72.5 depends on jsonschema<5 and >=3.0
    openapi-spec-validator 0.5.0 depends on jsonschema<5.0.0 and >=4.0.0
    aws-sam-translator 1.55.0 depends on jsonschema~=3.2

This should be resolved with the next aws-sam-translator release. https://github.com/aws/serverless-application-model has already relaxed the jsonschema requirement in develop, so this conflict should be resolves with the next aws-sam-translator release.

https://github.com/aws/serverless-application-model/issues/2426

jtherrmann commented 1 year ago

I built my environment from scratch on branch dependabot/pip/openapi-core-0.16.4 (see https://github.com/ASFHyP3/hyp3/pull/1380). I confirmed the following versions of the openapi-* packages:

openapi-core              0.16.4
openapi-schema-validator  0.3.4
openapi-spec-validator    0.5.1

When I run make tests on this branch, I get:

=============================================================================== short test summary info ===============================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_post_subscription - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_submit_subscriptions_missing_fields - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_wrong_user - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_date_too_far_out - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_not_found - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_enabled - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_get_subscription_by_id - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_post_subscriptions_validate_only - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_enabled - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_mixed_subscriptions - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_submit_subscription_with_granules - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
==================================================================== 17 failed, 152 passed, 482 warnings in 28.00s ====================================================================

When I run the tests from branch openapi-attempt (see https://github.com/ASFHyP3/hyp3/pull/1490), I get:

=============================================================================== short test summary info ===============================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
==================================================================== 5 failed, 164 passed, 530 warnings in 25.87s =====================================================================

So, we've fixed all of the TypeError failures for test_subscriptions, but we're seeing the following failure for both branches:

FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>

And now we're seeing the same status code failure for a handful of subscription tests as well.

jtherrmann commented 1 year ago

I'll start debugging the jobs failure first:

FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>

I've confirmed that the Jobs.post method in routes.py is returning a 200 OK response during the test. But the response returned by submit_batch(client, batch, validate_only=True) in test_submit_job.py::test_submit_validate_only is a <WrapperTestResponse streamed [400 BAD REQUEST]>, so I need to pinpoint where that 400 response is coming from.

jtherrmann commented 1 year ago

Also, before I go further, I should probably just upgrade to the latest openapi-* packages.

jtherrmann commented 1 year ago

I've started working on fixing the errors that result from upgrading to the latest package versions: https://github.com/ASFHyP3/hyp3/pull/1708

Unfortunately, it seems that the tests are now taking several minutes to run, and I have no idea why.

jtherrmann commented 1 year ago

I also started working on incorporating the WKT validator draft PR (https://github.com/ASFHyP3/hyp3/pull/1490) into the branch for upgrading the latest versions, but there are errors remaining to fix. I'd also like to read more about Unmarshalling vs. Validation.

jtherrmann commented 1 year ago

On branch upgrade-openapi, I confirmed the following package versions:

jsonschema                4.18.3                   pypi_0    pypi
jsonschema-spec           0.2.3                    pypi_0    pypi
jsonschema-specifications 2023.6.1                 pypi_0    pypi
openapi-core              0.18.0                   pypi_0    pypi
openapi-schema-validator  0.6.0                    pypi_0    pypi
openapi-spec-validator    0.6.0                    pypi_0    pypi

The tests are running extremely slowly. Here are the three slowest tests:

104.75s call     tests/test_api/test_submit_job.py::test_submit_many_jobs
102.69s call     tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end
101.32s call     tests/test_api/test_submit_job.py::test_submit_exceeds_quota

They are also the three tests failing with 401 status codes:

FAILED tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end - assert 401 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_submit_job.py::test_submit_many_jobs - assert 401 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_submit_job.py::test_submit_exceeds_quota - assert 401 == <HTTPStatus.BAD_REQUEST: 400>

These failures only occur when all the tests are run together. Each test passes if run individually. At first I thought that some fixture was not being torn down successfully, but it turns out that if we pass lifetime_in_seconds=1000 (default is 100 seconds) to the mock cookie in the login function in test_api/conftest.py, then the tests pass. Conversely, passing a lifetime of 0 causes many (all?) of the API tests to fail with 401 status codes. So, I think that the three slowest tests were failing because their auth cookies were timing out, and for some reason this was not occurring when the tests were run individually (maybe they ran just fast enough individually to avoid the timeout?).

So, now we need to determine why the tests are running so slowly.

jtherrmann commented 1 year ago

I wanted to rule out the possibility that the slow tests were making actual HTTP requests over the internet due to some mocking failure, so I ran all the tests with no internet connection, with the same results.

jtherrmann commented 1 year ago

Here are the runtimes for all the tests with runtime >= 1 second, run from the upgrade-openapi branch:

138.46s call     tests/test_api/test_submit_job.py::test_submit_many_jobs
100.27s call     tests/test_api/test_submit_job.py::test_submit_exceeds_quota
99.81s call     tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end
18.89s call     tests/test_api/test_submit_job.py::test_submit_good_autorift_granule_names
16.97s call     tests/test_api/test_list_jobs.py::test_list_jobs_subscription_id
13.62s call     tests/test_api/test_submit_job.py::test_submit_good_rtc_granule_names
8.92s call     tests/test_api/test_submit_job.py::test_submit_multiple_job_types
8.63s call     tests/test_api/test_submit_job.py::test_submit_mixed_job_parameters
8.17s call     tests/test_dynamo/test_jobs.py::test_put_jobs_priority_overflow
7.87s call     tests/test_api/test_list_jobs.py::test_list_jobs_by_type
7.58s call     tests/test_api/test_submit_job.py::test_float_input
5.29s call     tests/test_api/test_list_jobs.py::test_list_jobs_by_user_id
5.16s call     tests/test_api/test_list_jobs.py::test_list_jobs
5.00s call     tests/test_api/test_list_jobs.py::test_list_jobs_by_name
4.78s call     tests/test_api/test_submit_job.py::test_submit_insar_gamma
4.54s call     tests/test_api/test_submit_job.py::test_submit_job_granule_does_not_exist
3.97s call     tests/test_api/test_submit_job.py::test_submit_job_without_name
3.90s call     tests/test_api/test_submit_job.py::test_submit_one_job
2.61s call     tests/test_api/test_list_jobs.py::test_list_jobs_by_status
2.41s call     tests/test_api/test_get_job_by_id.py::test_get_job_by_id
2.41s call     tests/test_api/test_submit_job.py::test_submit_autorift
1.54s call     tests/test_api/test_submit_job.py::test_submit_validate_only

And here's the same for the tests run from develop (with the older package versions):

16.26s call     tests/test_dynamo/test_jobs.py::test_put_jobs_priority_overflow
1.99s call     tests/test_api/test_validation.py::test_validate_jobs
1.57s call     tests/test_api/test_submit_job.py::test_submit_many_jobs

So most tests are running much more slowly.

jtherrmann commented 1 year ago

I determined the cause of the slowdown. After I first upgraded the packages, tests were failing like:

FAILED tests/test_api/test_submit_job.py::test_submit_one_job - jsonschema.exceptions._WrappedReferencingError: PointerToNowhere: '/components/schemas/AUTORIFTJob' does not exist within {'openapi': '3.0.1', 'info': {'title': ...

It's because references like the following from apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2 no longer resolve:

- $ref: "#/components/schemas/{{ job_type }}Job"

So my naive solution was to add the filepath:

- $ref: "./job_parameters.yml#/components/schemas/{{ job_type }}Job"

Which apparently caused the massive slowdown. I profiled the unit tests and here is one relevant line:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9038    0.013    0.000  118.391    0.013 /home/jth/jth-apps/mambaforge/envs/hyp3/lib/python3.9/site-packages/jsonschema_spec/accessors.py:60(resolve)

We can see that there are 9038 calls to the resolve function from the jsonschema_spec package, resulting in a cumulative time of 118.391 seconds spent attempting to resolve schema paths for just one of our unit tests.

If we want to fully understand the changes that have been made to path resolution, we should read the release notes at https://github.com/python-jsonschema/jsonschema/releases. For example, v4.18.0 "majorly rehauls the way in which JSON Schema reference resolution is configured."

However, the shortest path to victory would be to move the contents of apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2 into apps/api/src/hyp3_api/api-spec/openapi-spec.yml.j2, removing all occurrences of ./job_parameters.yml in the reference paths. I've already confirmed that this will fix the test slowdown.

jtherrmann commented 1 year ago

After combining the API specs, tests run from upgrade-openapi are failing the same as shown here.

I'm working on fixing the WKT validator and other errors at https://github.com/ASFHyP3/hyp3/pull/1714. Currently seeing the following failures on the fix-wkt-validator branch:

====================================================================== short test summary info =======================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - assert {'detail': 'V...'about:blank'} == {'subscriptio...71861', ...}]}
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
============================================================ 6 failed, 163 passed, 545 warnings in 25.00s ============================================================
asjohnston-asf commented 1 year ago

After closer inspection, test_submit_validate_only and test_get_subscriptions are failing on response validation. That may be the cause of the remaining failures as well (other than test_search_criteria).

Our OpenAPI spec defines the json structure we expect users to give to the API. It also defines the json structure the user can expect back from any API call. By default, openapi-core validates the json response returned by the handler function; if it doesn't match the response schema openapi-core sends the user a HTTP 400 instead.

Validating our responses back to the user is tricky business. It gives us a way to detect when our API is breaking its contract, which is good. It usually comes at the expense of the user experience, though, especially for REST APIs that are a passthrough for a back-end database. A single non-compliant database record would cause any user query including that record to fail. Validating responses also makes changing the structure of our database more difficult. For example, we wouldn't be able to add a new field to job records that we want to be required for new jobs going forward without also updating all existing job records to include that field.

We ultimately decided not to validate responses from the HyP3 API against our OpenAPI schema. We did that by setting self.response_validator to a custom "do nothing" validator for all of our views. The same openapi-core bug impacting custom request validation also disabled our custom response validation. So, for example, test_get_subscriptions now fails because we aren't including the required enabled and creation_date fields in our response payload. test_submit_validate_only fails because GET /jobs doesn't return any json body when validate_only=True

I've updated https://github.com/python-openapi/openapi-core/issues/618 to confirm whether response validation falls under the same bug.

jtherrmann commented 1 year ago

I had added some missing fields to mock subscriptions to fix one of the tests, so I reverted that change now that we know why response validation is failing.

On fix-wkt-validator I'm now seeing the following test results:

====================================================================== short test summary info =======================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - assert {'detail': 'V...'about:blank'} == {'subscriptio...71861', ...}]}
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_get_subscription_by_id - assert {'detail': 'V...'about:blank'} == {'creation_da...d8ad542', ...}
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
============================================================ 7 failed, 162 passed, 545 warnings in 26.92s ============================================================

I'm not going to bother debugging the tests further until the openapi-core regression bugs are fixed.

jtherrmann commented 1 year ago

Since removing the Subscriptions feature made most of our work on this issue obsolete, I've started from scratch at https://github.com/ASFHyP3/hyp3/pull/1821.

jtherrmann commented 1 year ago

Blocked by https://github.com/python-openapi/openapi-core/issues/662

jtherrmann commented 1 year ago

New release: https://github.com/python-openapi/openapi-core/releases/tag/0.18.1 may allow us to disable response validation (adds "Unmarshalling customizations" in FlaskOpenAPIView) but I haven't looked into it yet.

jtherrmann commented 1 year ago

An option to disable response validation has been added (https://github.com/python-openapi/openapi-core/issues/662 was closed, see the PR linked from that issue) so we can move forward on this issue whenever openapi-core has another release. We would likely want to downgrade cfn-lint so that we can install the latest version of openapi-core (see https://github.com/ASFHyP3/hyp3/pull/1821), make any code changes necessary to accommodate the upgraded packages, and then figure out what to do about cfn-lint going forward (e.g. pin the downgraded version until cfn-lint un-pins jsonschema, wait to merge the openapi upgrade PR, etc.).

jtherrmann commented 1 year ago

The latest version of cfn-lint un-pins jsonschema, so we no longer need to downgrade cfn-lint.

jtherrmann commented 1 year ago

On the upgrade-openapi-after-subscriptions-removal branch, we now see two failing tests:

FAILED tests/test_api/test_api_spec.py::test_error_format - AssertionError: assert 'application/json' == 'application/problem+json'
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>

I'm not sure why the error Content-Type header would have changed, so we'll want to debug that to find the cause of the first failure.

jtherrmann commented 1 year ago

It appears that we may not need to move the contents of apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2 into apps/api/src/hyp3_api/api-spec/openapi-spec.yml.j2 now that we're at jsonschema==4.19.1. I'm not seeing any more API spec errors, and https://github.com/python-jsonschema/jsonschema/releases shows that they fixed various regression bugs related to deprecated features. Of course, this means that once those deprecated features have been removed, we will likely need to update our API spec. But we can probably hold off on making those updates until stuff starts breaking, and hopefully future jsonschema releases will provide helpful notes on exactly which features have been removed.

jtherrmann commented 1 year ago

Inspecting the second response defined in test_error_format here, response.json is as shown below:

{'errors': [{'class': "<class 'openapi_core.validation.request.exceptions.MissingRequiredRequestBody'>", 'status': 400, 'title': 'Missing required request body'}]}

This should be fixed when we implement the new method for disabling response validation. I don't know whether this will also fix the failing Content-Type assertion (see comment above).

jtherrmann commented 10 months ago

After merging https://github.com/ASFHyP3/hyp3/pull/1821 (which passes all tests), I'm now seeing responses like the one below for all https://hyp3-test-api.asf.alaska.edu (EDC UAT) endpoints (also via the hyp3-sdk). The body and headers below were copied from the Swagger UI.

Body:

{
  "detail": "Server not found for https://vd2gh6uqw3.execute-api.us-west-2.amazonaws.com/api/user",
  "status": 400,
  "title": "Bad Request",
  "type": "about:blank"
}

Headers:

content-length: 154
content-type: application/problem+json
date: Wed,10 Jan 2024 18:30:48 GMT
strict-transport-security: max-age=31536000
via: 1.1 76dcc62b68091cc715d50b5017be77fc.cloudfront.net (CloudFront)
x-amz-apigw-id: RVhN0G_5PHcEXkA=
x-amz-cf-id: hL4zWXgpkU_rtSgNA0LxSqwW-VBFSdLPgT0GB8NYHmPz42BFb3otoA==
x-amz-cf-pop: SEA73-P2
x-amzn-remapped-connection: keep-alive
x-amzn-remapped-content-length: 154
x-amzn-remapped-date: Wed,10 Jan 2024 18:30:48 GMT
x-amzn-remapped-server: Server
x-amzn-remapped-x-amzn-remapped-content-length: 154
x-amzn-remapped-x-amzn-requestid: 4f3a8483-e173-43c7-b251-647b8d3cdd6c
x-amzn-remapped-x-forwarded-for: 142.147.64.14,15.158.4.72
x-amzn-requestid: 66f0c4fc-134e-4976-9c9d-4122a3723ebf
x-amzn-trace-id: Root=1-659ee258-4d57c3463487dbf12ab4942d;Sampled=0;lineage=8c3a9ce5:0
x-cache: Error from cloudfront
x-content-type-options: nosniff
x-firefox-spdy: h2
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block 

However, https://hyp3-enterprise-test.asf.alaska.edu still works fine, which indicates the problem is specific to EDC.

The url https://vd2gh6uqw3.execute-api.us-west-2.amazonaws.com/ is the default endpoint for the AWS::ApiGateway::RestApi resource of the hyp3-edc-uat stack. At first, I thought the error message might suggest that our CloudFront distribution was failing to contact the ApiGateway server, but that no longer seems likely, as I'll explain.

I queried the /jobs and /user endpoints 15 times each at 18:30 UTC this morning. I observed 60 (would have expected 30, not sure why seeing double the amount) 4xx errors at 2024-01-10 18:30 UTC in the CloudFront metrics for distribution with ID ECFRN3AMYB8LV.

I also observed exactly 15 400 errors for each endpoint (30 total, as expected) in range 2024-01-10T18:29:00 to 2024-01-10T18:31:00 for log group hyp3-edc-uat-Api-O6ZX1MMCLCEC-ApiLogGroup-4W6NumbiNhfR (using Logs Insights query).

So, the responses are originating from ApiGateway.

jtherrmann commented 10 months ago

I also started digging into the openapi-core code to determine where the response error message originates from.

We define a custom error handler here which inherits from FlaskOpenAPIErrorsHandler, which is defined here and sets status code 400 when handling a ServerNotFound exception, which is defined here and has a string representation of f"Server not found for {self.url}", which I assume is responsible for the response error message (see above).

The ServerNotFound exception is raised here, from the BasePathFinder.find method when self._get_servers_iter does not yield any servers. The class APICallPathFinder inherits from BasePathFinder and implements the _get_servers_iter method here.

Finally, APICallPathFinder is instantiated here and its find method is called here (within the BaseAPICallValidator class) with the request method and URL passed as arguments.

That's as far as I got tracing the openapi-core code. I don't completely understand how all the pieces fit together, but it appears to parse the request URL's hostname as part of its validation process. This surprises me because I don't know why openapi-core (which wraps Flask) would care about the server URL.

Edit: I asked the openapi-core author(s) for clarification: https://github.com/python-openapi/openapi-core/discussions/757

jtherrmann commented 10 months ago

It turns out server url is part of the OpenAPI spec. Fix and explanation here: https://github.com/ASFHyP3/hyp3/pull/2009