INSPIRE-MIF / helpdesk-validator

Community discussion forum for INSPIRE validation issues
42 stars 23 forks source link

Permanent Test run failure of ETF-based Tests in the GDI-DE Testsuite due to „SSL read failed“ from inspire.ec.europa.eu #545

Closed g-weber closed 1 year ago

g-weber commented 3 years ago

The use of the ETF pods or containers in the GDI-DE Testsuite has been leading to the error "Internal ETF error" after a short time for some weeks. A restart helps only very briefly and only for a few test runs. As a consequence, no ETF-based test runs can currently be executed in GDI-DE Testsuite.

With the tool ApacheBench it could now be determined from various places that the JRC must have implemented a rate limit or similar.

Exemplary:

$ sudo apt update
$ sudo apt install apache2-utils
$ ab -n 10000 -c 1000 https://inspire.ec.europa.eu/draft-schemas/inspire-md-schemas-temp/apiso-inspire/apiso-inspire.xsd
$ ab -n 10000 -c 1000 https://inspire.ec.europa.eu/schemas/us-net-common/4.0/UtilityNetworksCommon.xsd
SSL read failed (5) - closing connection
SSL read failed (5) - closing connection
SSL read failed (5) - closing connection

Is it possible to monitor our IPs 141.74.64.225 and 141.74.48.225 and whitelist or unblock them if necessary?

Kind regards Gerd

carlospzurita commented 3 years ago

Dear @g-weber

We would kindly ask you to clarify a little bit more on this issue. Are you deploying yourself the containers for the INSPIRE validator using the latest release? And is it the problem related only to the .xsd access, or to the Test Suites in general?

Kind regards.

hwbllmnn commented 3 years ago

Hi @carlospzurita ,

this is related to #311 . We're using the latest release, modified to work with our proxy.

This means that the integrated squid is not used any more. Instead we're using our own workaround in order to use our corporate proxy. That proxy uses the above mentioned IPs to request the files directly via the internet.

Unfortunately the above mentioned SSL errors lead to hanging tests in the validator. The validator uses an internal queue to run tests, and the size of the queue is limited to the number of available CPUs (times three for tests which are in a waiting queue). After a while those queues fill up with hanging tests and no tests can be executed any more.

These SSL errors only show up after a number of requests have been fired, which we tested as shown above by @g-weber , which is why we assumed this is some sort of DDOS defense technique kicking in.

g-weber commented 3 years ago

Please have a look at comment from hwbllmnn.

Kind regards

bor8 commented 3 years ago

Approximate quote: And does the problem refer only to the .xsd access or to the test suites in general?

It refers to all *.europa.eu URLs that the ETF validator (local installation) retrieves from the internet.


Interim status: Our Squid proxy has caching activated. However, some URLs are redirected with 302. HTTP 302 is temporary by definition and is therefore not cached by Squid.

This means that despite local caching attempts, we run into the rate limit or DDOS defence described above.

carlospzurita commented 3 years ago

Dear all,

To have a clear picture on this, what is the rate of requests that you are sending to the *.europa.eu domain? And how many test runs do you create, let's say, per minute? This seems to be a problem located on the EC servers, and maybe they have a more restrictive policy on how many requests can be performed. With those number we can discuss internally about the possible solutions for this issue.

bor8 commented 3 years ago

I have collected europa.eu URLs on an unblocked instance. Once for a metadata dataset and once for a WMS. Here are the results:

MD:

$ cd
$ sudo tcpflow -p -c -i any port not 22 | grep -B 1 --line-buffered 'Host: bkg-app-http' | grep --line-buffered GET | tee ~/europa_eu_2_md.txt

$ for elt in $(cat ~/europa_eu_2_md.txt | cut -d ' ' -f 3); do echo https://inspire.ec.europa.eu$elt; done | sort | uniq -c | sort -nr | tee ~/europa_eu_3_md.txt
      6 https://inspire.ec.europa.eu/theme/theme.en.atom
      4 https://inspire.ec.europa.eu/theme/theme.sv.atom
      4 https://inspire.ec.europa.eu/theme/theme.sl.atom
      4 https://inspire.ec.europa.eu/theme/theme.sk.atom
      4 https://inspire.ec.europa.eu/theme/theme.ro.atom
      4 https://inspire.ec.europa.eu/theme/theme.pt.atom
      4 https://inspire.ec.europa.eu/theme/theme.pl.atom
      4 https://inspire.ec.europa.eu/theme/theme.nl.atom
      4 https://inspire.ec.europa.eu/theme/theme.mt.atom
      4 https://inspire.ec.europa.eu/theme/theme.lv.atom
      4 https://inspire.ec.europa.eu/theme/theme.lt.atom
      4 https://inspire.ec.europa.eu/theme/theme.it.atom
      4 https://inspire.ec.europa.eu/theme/theme.hu.atom
      4 https://inspire.ec.europa.eu/theme/theme.hr.atom
      4 https://inspire.ec.europa.eu/theme/theme.fr.atom
      4 https://inspire.ec.europa.eu/theme/theme.fi.atom
      4 https://inspire.ec.europa.eu/theme/theme.et.atom
      4 https://inspire.ec.europa.eu/theme/theme.es.atom
      4 https://inspire.ec.europa.eu/theme/theme.el.atom
      4 https://inspire.ec.europa.eu/theme/theme.de.atom
      4 https://inspire.ec.europa.eu/theme/theme.da.atom
      4 https://inspire.ec.europa.eu/theme/theme.cs.atom
      4 https://inspire.ec.europa.eu/theme/theme.bg.atom
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_swe.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_spa.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_slv.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_slo.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_rum.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_por.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_pol.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_mlt.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lit.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lav.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ita.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_hun.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gre.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gle.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ger.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fre.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fin.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_est.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_eng.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dut.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dan.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_cze.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_bul.xsd

$ cat ~/europa_eu_3_md.txt | tr -s ' ' | cut -d ' ' -f 2 | python3 -c 'import sys; print(sum(map(int, sys.stdin)))'
140

WMS:

$ cd
$ sudo tcpflow -p -c -i any port not 22 | grep -B 1 --line-buffered 'Host: bkg-app-http' | grep --line-buffered GET | tee ~/europa_eu_2_wms.txt 

$ for elt in $(cat ~/europa_eu_2_wms.txt | cut -d ' ' -f 3); do echo https://inspire.ec.europa.eu$elt; done | sort | uniq -c | sort -nr | tee ~/europa_eu_3_wms.txt
      2 https://inspire.ec.europa.eu/schemas/inspire_vs/1.0/inspire_vs.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/network.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_swe.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_spa.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_slv.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_slo.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_rum.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_por.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_pol.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_mlt.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lit.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_lav.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ita.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_hun.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gre.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_gle.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_ger.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fre.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_fin.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_est.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_eng.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dut.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_dan.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_cze.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/enums/enum_bul.xsd
      2 https://inspire.ec.europa.eu/schemas/common/1.0/common.xsd
      2 https://inspire.ec.europa.eu/layer/layer.en.xml

$ cat ~/europa_eu_3_wms.txt | tr -s ' ' | cut -d ' ' -f 2 | python3 -c 'import sys; print(sum(map(int, sys.stdin)))'
54

(Parts of these terminal commands only work with the GDI-DE Testsuite - because of our workaround proxy bkg-app-http.)

For a simple test, 140 or 54 new connections are established. As usual for an HTTP connect proxy, a basic connection is first established with CONNECT and the requested resource is then retrieved with GET via the relative path. Unfortunately (?), the basic connection with CONNECT does not remain open, but this two-step process is repeated for each individual resource.

Now do a little math: Assuming a metadata provider wants to start 50 INSPIRE tests in the GDI-DE Testsuite at the same time (which does happen), 50 * 140 (= 7000) requests will be sent in about one minute. And that is only one out of 260 users. In addition, there are automatic execution repetitions.

Update: HTTPS connections cannot be cached because they are encrypted and Squid has no information about the content within the encrypted tunnel connection (Squid only sees the outer, unencrypted CONNECT or TCP connection). - Unless you make a Man-In-The-Middle attack with https://wiki.squid-cache.org/Features/SslPeekAndSplice or similar, see https://stackoverflow.com/questions/18725987/enable-cache-for-ssl-connection-in-squid. This means that all requests are fired at the EC instances and the first 5000 or so also arrive and are served! If the flood of requests does not subside, we are automatically blocked for a longer period of time.

g-weber commented 3 years ago

Dear all, this problems with the usage of the ETF-Validator Instance in the GDI-DE testsuite are still an urgent issue for us. Therefore we would like to ask if you have made any progress discussing internally possible sollutions?

ghost commented 3 years ago

In order to document this problem in https://github.com/INSPIRE-MIF/helpdesk-validator/issues/331#issue-637608963:

Does this problem always occur if caching mechanism (squid or some other solution) is not used in order to cache the resources like .xsd files but the files are requested directly via the internet many times in short period?

Thanks in advance!

g-weber commented 3 years ago

There is a squid chaching mechanism in place which will cache http requests. But Squid ist not able to cache https-requests. As described above all the xsd file requests are requested via https and therefore they can't be cached by squid.

bor8 commented 3 years ago

Dejan, to answer your question short: Yes, it is probably independent from proxy.

https://serverfault.com/questions/1067829/what-does-aws-ec2-ddos-protection-shield-throw-when-activated-https-503

ghost commented 3 years ago

@g-weber @bor8 I have tried to document this challenge here in order to have better overview of the deployment challenges.

bor8 commented 3 years ago

Bump!!!

carlospzurita commented 3 years ago

Dear all,

Sorry for the delay on the response for this issue. We are still working with on the INSPIRE validator side and collaborating with the JRC team responsible for the INSPIRE Registry in order to find out what is the best approach for this. We will share feedback as soon as we have an advance on the issue.

bor8 commented 3 years ago

You are on it, after all.

The problem is that many resources are repeatedly requested and the JRC firewall shuts down when it is overloaded.

Terrestris thinks you should deliver the resources locally instead of retrieving them via a URL. Or cache them in the application...

What does the ec in inspire.ec.europa.eu stand for? European Commission or Amazon EC2?

carlospzurita commented 3 years ago

Dear users,

After consideration of this issue and how to address the availability of the INSPIRE Registry resources to a high volume of requests, we have come to the conclusion that the best solution would be to have the INSPIRE Registry resources available locally in the ETF. Given that these resources are served through HTTPS, a normal caching system would not allow to develop this solution. So the way ahead to solve this would be to create an internal service that downloads and refreshes the resources as requested by the Validator. This service would be internally redirected by a proxy pass system, as it is done now with HTTP requests.

Each time the Validator needs to access any resource under the domain inspire.ec.europa.eu, this call will be redirected to this client that will store the files or redownload them if they have expired. This way the normal workflow of HTTPS requests would be preserved, making the communication between a client and a server without need to execute de jure man-in-the-middle attacks.

Given that this is a very specific requirement (so far not reported by other organisations), and it depends on the peculiar network infrastructure requiring to change the release redirection system for the domain inspire.ec.europa.eu, for the time being we can only suggest this possible solution to overcome the issue, but we do not foresee to develop such a solution within the INSPIRE Reference Validator.

hwbllmnn commented 2 years ago

Hi @carlospzurita ,

Is there a way to download the necessary resources in a single zip file? I've been trying to download them via wget for the past two days but the download is very slow (probably due to the limit itself). In more than 48 hours I've only managed to get about 4GB of data. I can patch the scripts to use an internal proxy and it seems to work, but without all the resources not all tests will run.

hogredan commented 2 years ago

Dear @carlospzurita ,

could you already check whether all necessary resources could be provided in a single zip file? That would be very helpful for our internal update processes!

Thank you and best regards, Daniela

jenriquesoriano commented 2 years ago

In preparation of the upcoming discussion with the JRC, we have reviewed the status of the issue and we would like to check what is its current stage in relation to the proposed solution and what would be the pending needs.

As far as we can see from the issue trace, the implementation of a proxy, that would redirect requests to a service that would collect/have collected the information from the repository and act as a cache, was proposed. In order to implement this cache, we have also observed your request to download the data from inspire.ec.europa.eu in a centralized package, probably due to problems of banning or certainty in the availability of the latest versions (please excuse the late reply to this point).

Thus, we would like to know if the proposed solution for the proxy-cache has been successfully implemented and, additionally, if the centralized input of the information to be cached is still required, so it is downloaded from only one specific location.

Finally, if there are any additional aspects that could potentially be shared and be useful for the next meeting or for the Community, please let us know.

g-weber commented 2 years ago

Dear @jenriquesoriano,

Here is the feedback you requested:

Currently, we use three ETF Validator instances as test engines in the Kubernetes cluster of the GDI test suite.

To solve the problem with the blocked resources from inspire.ec.europa.eu needed by the ETF tests, we finally implemented the following solution approach for the GDI-DE Test Suite: Provisioning the resources of inspire.ec.europa.eu within the test suite via an internal nginx container in the Kubernetes cluster of the GDI-DE test suite with redirection of inspire.ec.europa.eu requests to this container, so that access to the remote files of inspire.ec.europa.eu is no longer necessary.

Still open is the point how to identify all the actually required resources from inspire.ec.europa.eu. Currently this is done on a manual basis in connection with each new ETF Validator release.

To replace this manual and error-prone process, we would therefore appreciate if a package with all resources from inspire.ec.europa.eu that are actually used by the ETF Validator could be provided with each new ETF Validator release.

For general information, some further observations related to the ETF instances within the GDI-DE test suite from the past months:

Results of further problem analysis with direct ETF debugging:

g-weber commented 2 years ago

Dear @jenriquesoriano,

At our meeting on 7 July, we also discussed our following request

"... we would therefore appreciate if a package with all the resources from inspire.ec.europa.eu that are actually used by the ETF Validator could be provided with each new version of the ETF Validator."

You stated that from a technical point of view it should be quite easy to provide such a ZIP file with the necessary resources and that you would consider a concrete technical implementation in a timely manner. Hence our question: Is it to be expected that such a ZIP file can already be made available to us with the forthcoming publication of the ETF validator?

That would be extremely helpful.

Thanks in advance.

verendi commented 1 year ago

Thank you for providing the registry sources with the latest release as a zip file. We could successfully integrate these sources into our setup and almost everything works fine. Unfortunately, we have noticed one error: The file ./schemas/common/1.0/enums/enum_fin.xsd is missing in the zipfile and therefore the metadata test "INSPIRE Metadata TG 2.0 - data sets and data sets series" is complaining about the missing file. Could you please add the missing file to the zipfile?

dperezBM commented 1 year ago

We have added the missing file, I hope you don't have problems and this fix the problem. Let us know if you are still having problems.