Open-EO / openeo-backend-validator

Service to validate back-end compliance with the API specification.
Apache License 2.0
1 stars 3 forks source link

OpenEO Validator Web Interface: Validation results are not reliable #48

Open klimeto opened 3 years ago

klimeto commented 3 years ago

Dear devs,

When I try to validate our OpenEO API instance using Web Interface of the OpenEO Validator, I get heterogeneous results while not changing anything on the our backend side:

Example:

I run the validation via this URL: https://www.geo.tuwien.ac.at/openeoct/backend/validate/deliverable/7

I get the following error for EO Data Discovery test case:

Endpoint Method Message State
collection GET    
collections GET Input: ; Error: Response of the back end not valid; Details: input header 'Content-Type' has unexpected value: ''  

Then I refresh the page and the EO Data Discovery is valid:

Endpoint Method Message State
collection GET    
collections GET    

Then I refresh a page again and I get the same error for e.g. Capabilities group.

The error is always the same Error: Response of the back end not valid; Details: input header 'Content-Type' has unexpected value:

Could you please check why the Web Validator does not provide reliable results?

Thank you,

bgoesswe commented 3 years ago

That is really strange, also because I was not able to recreate this issue on other backends, but I will now debug the application to see what causes this message.

bgoesswe commented 3 years ago

I found out that it is independent from the web application, so also in the CLI version this happens. When debugging the application I also found out that sometimes the response from your backend does not contain a "Content-Type" in the header, but it seems to be independent from the endpoint (although more often on GET endpoints) and it is pretty random. Still, when I tested it there was always at least one endpoint with a missing "Content-Type".

e.g.: Endpoint: /openeo/collections/EarthObservation.Copernicus.Sentinel2.Level1C

"Wrong" Header Response:

{"Date":["Thu, 12 Nov 2020 12:22:49 GMT"],"Server":["Apache"],"Vary":["Host"]}

"Correct" Header Response:

 {"Content-Length":["1117"],"Content-Type":["application/json"],"Date":["Thu, 12 Nov 2020 12:51:37 GMT"],"Server":["Apache"],"Via":["1.1 cidportal.jrc.ec.europa.eu"]}

Both responses header happened on the two different executions of the same endpoint. I have a feeling that it has something to do with the fast execution of several endpoint calls on your backend.

I'll try to investigate this further.

klimeto commented 3 years ago

Could be our reverse proxy is cutting off the header information from responses? That is really strange. Now I have implemented another endpoint PATCH /jobs and try to validate by reloading the validation page I always get different results, out of which one is the following:

Error during execution! Have you applyed the changes on the D28 validation page?

b"2020/11/12 15:46:26 Warning: Not able to catch the job_id from POST /jobs header via 'OpenEO-Identifier' or empty!\n"
bgoesswe commented 3 years ago

Ok that is an issue of the web tool related to #47, I will fix this now. On the CLI tool I saw that this endpoint is "Valid" so you can assume it is fine.

klimeto commented 3 years ago

Thanks @bgoesswe, let me know when you manage to fix it please and let me know if you need any inputs from me.

bgoesswe commented 3 years ago

The issue #47 is now fixed, it will now break only on errors during execution and not from warning messages as before. It will also print the warning messages on the validation result page.

klimeto commented 3 years ago

@bgoesswe I just reloaded the validation page of our backend and I still get:

jobs_job_id GET Input: ; Error: Response of the back end not valid; Details: input header 'Content-Type' has unexpected value: ''  

then I tried in another browser (Opera) and I got the following:

Endpoint Method Message State
jobs GET Input: ; Error: Response of the back end not valid; Details: input header 'Content-Type' has unexpected value: ''
bgoesswe commented 3 years ago

Yes, that is still because sometimes there is no content-type in the response header from the backend. I am just trying to figure out if there are some sequences of endpoint calls causing this. Other than that I can not really do something, because ruling this error out would reduce the ability of the validator. It also seems that on random occasions the header of the job Id is missing, which is also why the error message of #47 happened sometimes. Maybe other back end provider had a similar issue? @lforesta @soxofaan @m-mohr

I am also not sure if this is a problem for D28, since it just happens occasionally?

soxofaan commented 3 years ago

I don't think I've seen this issue with the VITO backend

lforesta commented 3 years ago

I've never experienced this issue on the EODC backend, nor other I tested for D28

klimeto commented 3 years ago

I really don't understand where the problem is. I tried to run the following script on our backend from my workstation and it never exited:


import sys
import requests

_TESTED_URLS = [
    "https://jeodpp.jrc.ec.europa.eu/openeo",
    "https://jeodpp.jrc.ec.europa.eu/openeo/jobs/027e3c4b-ab83-4d34-a6b2-4f3d9d797e54",
    "https://jeodpp.jrc.ec.europa.eu/openeo/udf_runtimes"
]

def send_request(url: str):
    request = requests.get("https://jeodpp.jrc.ec.europa.eu/openeo/udf_runtimes")
    return request

while True:
    for url in _TESTED_URLS:
        resp_data = send_request(url)
        content_type = resp_data.headers.get("content-type")
        if content_type != "application/json":
            sys.exit("content-type is not JSON")
        else:
            print(content_type)

Are you sending requests in parallel? If so I can try to simulate it as well.

klimeto commented 3 years ago

Guys it happened on our side, sorry. So now am gonna investigate where in our infrastructure this problem occurs and will report back. I have never experienced such a problem so far though.

bgoesswe commented 3 years ago

Good to hear, please let us know when you found the issue.

klimeto commented 3 years ago

Dears we seem to have fixed the issue at our reverse proxy server.

However still I would like to make some tests at the application level simulating the way you use to validate an endpoint.

Is there a possibility to get the alghoritm and data you use when validating an OpenEO endpoint?

bgoesswe commented 3 years ago

In general everything is in this repository in the "openeoct" folder, unfortunately we are using an existing Go openAPI validator (kin-openapi) and therefore we do not have the complete algorithms. But in general, when validating the following steps are getting done in the given order:

  1. (Optional) Lookup in the .well-known endpoint if there is a backend version in the configuration.
  2. Authentication via /credentials/basic endpoint (storing the bearer token for the next steps)
  3. Lookup of capabilities endpoint ( / ) to store the endpoints that the backend implements.
  4. One after another (not parallel) calling the endpoints configured in the config file (for D28 every endpoint of the API version 1.0.0) For every endpoint the request and the response from the backend gets validated separately.
  5. Storing the validation results either in an output file (e.g. for D28) or in the stdout of the console it is called.

I hope this helps