Closed asjohnston-asf closed 10 months ago
The conflict between openapi-spec-validator and openapi-core has been resolved in the current versions, but upgrading both leads to another dependency conflict for jsonschema
:
The conflict is caused by:
cfn-lint 0.72.5 depends on jsonschema<5 and >=3.0
openapi-spec-validator 0.5.0 depends on jsonschema<5.0.0 and >=4.0.0
aws-sam-translator 1.55.0 depends on jsonschema~=3.2
This should be resolved with the next aws-sam-translator release. https://github.com/aws/serverless-application-model has already relaxed the jsonschema requirement in develop
, so this conflict should be resolves with the next aws-sam-translator release.
https://github.com/aws/serverless-application-model/issues/2426
I built my environment from scratch on branch dependabot/pip/openapi-core-0.16.4
(see https://github.com/ASFHyP3/hyp3/pull/1380). I confirmed the following versions of the openapi-*
packages:
openapi-core 0.16.4
openapi-schema-validator 0.3.4
openapi-spec-validator 0.5.1
When I run make tests
on this branch, I get:
=============================================================================== short test summary info ===============================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_post_subscription - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_submit_subscriptions_missing_fields - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_wrong_user - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_date_too_far_out - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_subscription_not_found - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_update_enabled - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_get_subscription_by_id - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_post_subscriptions_validate_only - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_enabled - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_mixed_subscriptions - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
FAILED tests/test_api/test_subscriptions.py::test_submit_subscription_with_granules - TypeError: __init__() got an unexpected keyword argument 'custom_formatters'
==================================================================== 17 failed, 152 passed, 482 warnings in 28.00s ====================================================================
When I run the tests from branch openapi-attempt
(see https://github.com/ASFHyP3/hyp3/pull/1490), I get:
=============================================================================== short test summary info ===============================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
==================================================================== 5 failed, 164 passed, 530 warnings in 25.87s =====================================================================
So, we've fixed all of the TypeError
failures for test_subscriptions
, but we're seeing the following failure for both branches:
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
And now we're seeing the same status code failure for a handful of subscription tests as well.
I'll start debugging the jobs failure first:
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
I've confirmed that the Jobs.post
method in routes.py
is returning a 200 OK
response during the test. But the response returned by submit_batch(client, batch, validate_only=True)
in test_submit_job.py::test_submit_validate_only
is a <WrapperTestResponse streamed [400 BAD REQUEST]>
, so I need to pinpoint where that 400
response is coming from.
Also, before I go further, I should probably just upgrade to the latest openapi-*
packages.
I've started working on fixing the errors that result from upgrading to the latest package versions: https://github.com/ASFHyP3/hyp3/pull/1708
Unfortunately, it seems that the tests are now taking several minutes to run, and I have no idea why.
I also started working on incorporating the WKT validator draft PR (https://github.com/ASFHyP3/hyp3/pull/1490) into the branch for upgrading the latest versions, but there are errors remaining to fix. I'd also like to read more about Unmarshalling vs. Validation.
On branch upgrade-openapi
, I confirmed the following package versions:
jsonschema 4.18.3 pypi_0 pypi
jsonschema-spec 0.2.3 pypi_0 pypi
jsonschema-specifications 2023.6.1 pypi_0 pypi
openapi-core 0.18.0 pypi_0 pypi
openapi-schema-validator 0.6.0 pypi_0 pypi
openapi-spec-validator 0.6.0 pypi_0 pypi
The tests are running extremely slowly. Here are the three slowest tests:
104.75s call tests/test_api/test_submit_job.py::test_submit_many_jobs
102.69s call tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end
101.32s call tests/test_api/test_submit_job.py::test_submit_exceeds_quota
They are also the three tests failing with 401
status codes:
FAILED tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end - assert 401 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_submit_job.py::test_submit_many_jobs - assert 401 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_submit_job.py::test_submit_exceeds_quota - assert 401 == <HTTPStatus.BAD_REQUEST: 400>
These failures only occur when all the tests are run together. Each test passes if run individually. At first I thought that some fixture was not being torn down successfully, but it turns out that if we pass lifetime_in_seconds=1000
(default is 100
seconds) to the mock cookie in the login
function in test_api/conftest.py
, then the tests pass. Conversely, passing a lifetime of 0
causes many (all?) of the API tests to fail with 401
status codes. So, I think that the three slowest tests were failing because their auth cookies were timing out, and for some reason this was not occurring when the tests were run individually (maybe they ran just fast enough individually to avoid the timeout?).
So, now we need to determine why the tests are running so slowly.
I wanted to rule out the possibility that the slow tests were making actual HTTP requests over the internet due to some mocking failure, so I ran all the tests with no internet connection, with the same results.
Here are the runtimes for all the tests with runtime >= 1 second, run from the upgrade-openapi
branch:
138.46s call tests/test_api/test_submit_job.py::test_submit_many_jobs
100.27s call tests/test_api/test_submit_job.py::test_submit_exceeds_quota
99.81s call tests/test_api/test_list_jobs.py::test_list_jobs_date_start_and_end
18.89s call tests/test_api/test_submit_job.py::test_submit_good_autorift_granule_names
16.97s call tests/test_api/test_list_jobs.py::test_list_jobs_subscription_id
13.62s call tests/test_api/test_submit_job.py::test_submit_good_rtc_granule_names
8.92s call tests/test_api/test_submit_job.py::test_submit_multiple_job_types
8.63s call tests/test_api/test_submit_job.py::test_submit_mixed_job_parameters
8.17s call tests/test_dynamo/test_jobs.py::test_put_jobs_priority_overflow
7.87s call tests/test_api/test_list_jobs.py::test_list_jobs_by_type
7.58s call tests/test_api/test_submit_job.py::test_float_input
5.29s call tests/test_api/test_list_jobs.py::test_list_jobs_by_user_id
5.16s call tests/test_api/test_list_jobs.py::test_list_jobs
5.00s call tests/test_api/test_list_jobs.py::test_list_jobs_by_name
4.78s call tests/test_api/test_submit_job.py::test_submit_insar_gamma
4.54s call tests/test_api/test_submit_job.py::test_submit_job_granule_does_not_exist
3.97s call tests/test_api/test_submit_job.py::test_submit_job_without_name
3.90s call tests/test_api/test_submit_job.py::test_submit_one_job
2.61s call tests/test_api/test_list_jobs.py::test_list_jobs_by_status
2.41s call tests/test_api/test_get_job_by_id.py::test_get_job_by_id
2.41s call tests/test_api/test_submit_job.py::test_submit_autorift
1.54s call tests/test_api/test_submit_job.py::test_submit_validate_only
And here's the same for the tests run from develop
(with the older package versions):
16.26s call tests/test_dynamo/test_jobs.py::test_put_jobs_priority_overflow
1.99s call tests/test_api/test_validation.py::test_validate_jobs
1.57s call tests/test_api/test_submit_job.py::test_submit_many_jobs
So most tests are running much more slowly.
I determined the cause of the slowdown. After I first upgraded the packages, tests were failing like:
FAILED tests/test_api/test_submit_job.py::test_submit_one_job - jsonschema.exceptions._WrappedReferencingError: PointerToNowhere: '/components/schemas/AUTORIFTJob' does not exist within {'openapi': '3.0.1', 'info': {'title': ...
It's because references like the following from apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2
no longer resolve:
- $ref: "#/components/schemas/{{ job_type }}Job"
So my naive solution was to add the filepath:
- $ref: "./job_parameters.yml#/components/schemas/{{ job_type }}Job"
Which apparently caused the massive slowdown. I profiled the unit tests and here is one relevant line:
ncalls tottime percall cumtime percall filename:lineno(function)
9038 0.013 0.000 118.391 0.013 /home/jth/jth-apps/mambaforge/envs/hyp3/lib/python3.9/site-packages/jsonschema_spec/accessors.py:60(resolve)
We can see that there are 9038
calls to the resolve
function from the jsonschema_spec
package, resulting in a cumulative time of 118.391
seconds spent attempting to resolve schema paths for just one of our unit tests.
If we want to fully understand the changes that have been made to path resolution, we should read the release notes at https://github.com/python-jsonschema/jsonschema/releases. For example, v4.18.0
"majorly rehauls the way in which JSON Schema reference resolution is configured."
However, the shortest path to victory would be to move the contents of apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2
into apps/api/src/hyp3_api/api-spec/openapi-spec.yml.j2
, removing all occurrences of ./job_parameters.yml
in the reference paths. I've already confirmed that this will fix the test slowdown.
After combining the API specs, tests run from upgrade-openapi
are failing the same as shown here.
I'm working on fixing the WKT validator and other errors at https://github.com/ASFHyP3/hyp3/pull/1714. Currently seeing the following failures on the fix-wkt-validator
branch:
====================================================================== short test summary info =======================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - assert {'detail': 'V...'about:blank'} == {'subscriptio...71861', ...}]}
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
============================================================ 6 failed, 163 passed, 545 warnings in 25.00s ============================================================
After closer inspection, test_submit_validate_only
and test_get_subscriptions
are failing on response validation. That may be the cause of the remaining failures as well (other than test_search_criteria
).
Our OpenAPI spec defines the json structure we expect users to give to the API. It also defines the json structure the user can expect back from any API call. By default, openapi-core
validates the json response returned by the handler function; if it doesn't match the response schema openapi-core sends the user a HTTP 400 instead.
Validating our responses back to the user is tricky business. It gives us a way to detect when our API is breaking its contract, which is good. It usually comes at the expense of the user experience, though, especially for REST APIs that are a passthrough for a back-end database. A single non-compliant database record would cause any user query including that record to fail. Validating responses also makes changing the structure of our database more difficult. For example, we wouldn't be able to add a new field to job records that we want to be required for new jobs going forward without also updating all existing job records to include that field.
We ultimately decided not to validate responses from the HyP3 API against our OpenAPI schema. We did that by setting self.response_validator
to a custom "do nothing" validator for all of our views. The same openapi-core bug impacting custom request validation also disabled our custom response validation. So, for example, test_get_subscriptions
now fails because we aren't including the required enabled
and creation_date
fields in our response payload. test_submit_validate_only
fails because GET /jobs
doesn't return any json body when validate_only=True
I've updated https://github.com/python-openapi/openapi-core/issues/618 to confirm whether response validation falls under the same bug.
I had added some missing fields to mock subscriptions to fix one of the tests, so I reverted that change now that we know why response validation is failing.
On fix-wkt-validator
I'm now seeing the following test results:
====================================================================== short test summary info =======================================================================
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_search_criteria - assert 200 == <HTTPStatus.BAD_REQUEST: 400>
FAILED tests/test_api/test_subscriptions.py::test_get_subscriptions - assert {'detail': 'V...'about:blank'} == {'subscriptio...71861', ...}]}
FAILED tests/test_api/test_subscriptions.py::test_update_subscription - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_get_subscription_by_id - assert {'detail': 'V...'about:blank'} == {'creation_da...d8ad542', ...}
FAILED tests/test_api/test_subscriptions.py::test_query_subscription_by_name - assert 400 == <HTTPStatus.OK: 200>
FAILED tests/test_api/test_subscriptions.py::test_query_jobs_by_job_type - assert 400 == <HTTPStatus.OK: 200>
============================================================ 7 failed, 162 passed, 545 warnings in 26.92s ============================================================
I'm not going to bother debugging the tests further until the openapi-core
regression bugs are fixed.
Since removing the Subscriptions feature made most of our work on this issue obsolete, I've started from scratch at https://github.com/ASFHyP3/hyp3/pull/1821.
New release: https://github.com/python-openapi/openapi-core/releases/tag/0.18.1 may allow us to disable response validation (adds "Unmarshalling customizations" in FlaskOpenAPIView
) but I haven't looked into it yet.
An option to disable response validation has been added (https://github.com/python-openapi/openapi-core/issues/662 was closed, see the PR linked from that issue) so we can move forward on this issue whenever openapi-core
has another release. We would likely want to downgrade cfn-lint
so that we can install the latest version of openapi-core
(see https://github.com/ASFHyP3/hyp3/pull/1821), make any code changes necessary to accommodate the upgraded packages, and then figure out what to do about cfn-lint
going forward (e.g. pin the downgraded version until cfn-lint
un-pins jsonschema
, wait to merge the openapi upgrade PR, etc.).
The latest version of cfn-lint
un-pins jsonschema
, so we no longer need to downgrade cfn-lint
.
On the upgrade-openapi-after-subscriptions-removal
branch, we now see two failing tests:
FAILED tests/test_api/test_api_spec.py::test_error_format - AssertionError: assert 'application/json' == 'application/problem+json'
FAILED tests/test_api/test_submit_job.py::test_submit_validate_only - assert 400 == <HTTPStatus.OK: 200>
I'm not sure why the error Content-Type
header would have changed, so we'll want to debug that to find the cause of the first failure.
It appears that we may not need to move the contents of apps/api/src/hyp3_api/api-spec/job_parameters.yml.j2
into apps/api/src/hyp3_api/api-spec/openapi-spec.yml.j2
now that we're at jsonschema==4.19.1
. I'm not seeing any more API spec errors, and https://github.com/python-jsonschema/jsonschema/releases shows that they fixed various regression bugs related to deprecated features. Of course, this means that once those deprecated features have been removed, we will likely need to update our API spec. But we can probably hold off on making those updates until stuff starts breaking, and hopefully future jsonschema
releases will provide helpful notes on exactly which features have been removed.
Inspecting the second response
defined in test_error_format
here, response.json
is as shown below:
{'errors': [{'class': "<class 'openapi_core.validation.request.exceptions.MissingRequiredRequestBody'>", 'status': 400, 'title': 'Missing required request body'}]}
This should be fixed when we implement the new method for disabling response validation. I don't know whether this will also fix the failing Content-Type
assertion (see comment above).
After merging https://github.com/ASFHyP3/hyp3/pull/1821 (which passes all tests), I'm now seeing responses like the one below for all https://hyp3-test-api.asf.alaska.edu (EDC UAT) endpoints (also via the hyp3-sdk). The body and headers below were copied from the Swagger UI.
Body:
{
"detail": "Server not found for https://vd2gh6uqw3.execute-api.us-west-2.amazonaws.com/api/user",
"status": 400,
"title": "Bad Request",
"type": "about:blank"
}
Headers:
content-length: 154
content-type: application/problem+json
date: Wed,10 Jan 2024 18:30:48 GMT
strict-transport-security: max-age=31536000
via: 1.1 76dcc62b68091cc715d50b5017be77fc.cloudfront.net (CloudFront)
x-amz-apigw-id: RVhN0G_5PHcEXkA=
x-amz-cf-id: hL4zWXgpkU_rtSgNA0LxSqwW-VBFSdLPgT0GB8NYHmPz42BFb3otoA==
x-amz-cf-pop: SEA73-P2
x-amzn-remapped-connection: keep-alive
x-amzn-remapped-content-length: 154
x-amzn-remapped-date: Wed,10 Jan 2024 18:30:48 GMT
x-amzn-remapped-server: Server
x-amzn-remapped-x-amzn-remapped-content-length: 154
x-amzn-remapped-x-amzn-requestid: 4f3a8483-e173-43c7-b251-647b8d3cdd6c
x-amzn-remapped-x-forwarded-for: 142.147.64.14,15.158.4.72
x-amzn-requestid: 66f0c4fc-134e-4976-9c9d-4122a3723ebf
x-amzn-trace-id: Root=1-659ee258-4d57c3463487dbf12ab4942d;Sampled=0;lineage=8c3a9ce5:0
x-cache: Error from cloudfront
x-content-type-options: nosniff
x-firefox-spdy: h2
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
However, https://hyp3-enterprise-test.asf.alaska.edu still works fine, which indicates the problem is specific to EDC.
The url https://vd2gh6uqw3.execute-api.us-west-2.amazonaws.com/
is the default endpoint for the AWS::ApiGateway::RestApi
resource of the hyp3-edc-uat
stack. At first, I thought the error message might suggest that our CloudFront distribution was failing to contact the ApiGateway server, but that no longer seems likely, as I'll explain.
I queried the /jobs
and /user
endpoints 15 times each at 18:30 UTC this morning. I observed 60 (would have expected 30, not sure why seeing double the amount) 4xx
errors at 2024-01-10 18:30 UTC
in the CloudFront metrics for distribution with ID ECFRN3AMYB8LV
.
I also observed exactly 15 400
errors for each endpoint (30 total, as expected) in range 2024-01-10T18:29:00
to 2024-01-10T18:31:00
for log group hyp3-edc-uat-Api-O6ZX1MMCLCEC-ApiLogGroup-4W6NumbiNhfR
(using Logs Insights query).
So, the responses are originating from ApiGateway.
I also started digging into the openapi-core
code to determine where the response error message originates from.
We define a custom error handler here which inherits from FlaskOpenAPIErrorsHandler
, which is defined here and sets status code 400
when handling a ServerNotFound
exception, which is defined here and has a string representation of f"Server not found for {self.url}"
, which I assume is responsible for the response error message (see above).
The ServerNotFound
exception is raised here, from the BasePathFinder.find
method when self._get_servers_iter
does not yield any servers. The class APICallPathFinder
inherits from BasePathFinder
and implements the _get_servers_iter
method here.
Finally, APICallPathFinder
is instantiated here and its find
method is called here (within the BaseAPICallValidator
class) with the request method and URL passed as arguments.
That's as far as I got tracing the openapi-core
code. I don't completely understand how all the pieces fit together, but it appears to parse the request URL's hostname as part of its validation process. This surprises me because I don't know why openapi-core
(which wraps Flask) would care about the server URL.
Edit: I asked the openapi-core
author(s) for clarification: https://github.com/python-openapi/openapi-core/discussions/757
It turns out server url is part of the OpenAPI spec. Fix and explanation here: https://github.com/ASFHyP3/hyp3/pull/2009
Jira: https://asfdaac.atlassian.net/browse/TOOL-2046
After upgrading to openapi-spec-validator to v0.5.0,
make install
fails with:requirements-api.txt
requiresopenapi-core 0.14.5
, which depends on openapi-spec-validator<0.5.0. See https://github.com/p1c2u/openapi-core/issues/402 for more details.We separately specify openapi-spec-validator as a requirement in
requirements-all.txt
, since we use it for static analysis independent of deploying the API. We can't upgrade the version specified there past 0.4.0 because of this conflict.