EOSC-synergy / FAIR_eva

Apache License 2.0
4 stars 4 forks source link

Evaluator rda_a1_03d: Stuck when doing header requests to files #37

Closed rjdmartins closed 2 years ago

rjdmartins commented 2 years ago

The rda_a1_03d test gets stuck for the DOI 10.21950/JWIRUU using "oai-pmh" repo and endpoint "https://qtydataverse.fccn.pt/oai". The problem is in this test, it tries to connect to qtydataverse.fccn.pt:80 (i.e, http) in order to validate the files from a spanish dataset, and since FCCN dataverse does not have http port open, it timeouts.

(...) DEBUG:urllib3.connectionpool:https://edatos.consorciomadrono.es:443 "GET /citation?persistentId=doi:10.21950/JWIRUU HTTP/1.1" 302 231 /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'edatos.consorciomadrono.es'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings warnings.warn( DEBUG:urllib3.connectionpool:https://edatos.consorciomadrono.es:443 "GET /dataset.xhtml?persistentId=doi:10.21950/JWIRUU HTTP/1.1" 200 27260 DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): qtydataverse.fccn.pt:80 ERROR:root:HTTPConnectionPool(host='qtydataverse.fccn.pt', port=80): Max retries exceeded with url: /resources/txt/guiaEcienciaDatos.pdf;jsessionid=140ef0e99283ee82382274d17335 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f272948ac10>: Failed to establish a new connection: [Errno 110] Connection timed out')) (...)

Checking evaluator.py, found this: In the line 620

landing_url = urllib.parse.urlparse(self.oai_base).netloc

If I set the endpoint (self.oai_base), it removes the protocol prefix. Then, in the lines 627-629

url = landing_url + f
if 'http' not in url:
    url = "http://" + url

Applies http prefix and tries to do a request with it, and obviously it gets stuck and timeouts because not all dataverses (like FCCN) do have an http port open.

Solution: My suggestion is to fix that with the following at lines 627-629:

url = landing_url + f
if 'http:' not in url and 'http:' in self.oai_base:
    url = "http://" + url
elif 'https:' not in url and 'https:' in self.oai_base:
    url = "https://" + url

This will check if provided oai endpoint has a prefix protocol, and applies the correct one for the url header request. This fixes the issue. Also, you need to assure that self.oai_base has a prefix protocol set, otherwise, this may not work.

ferag commented 2 years ago

Hi Ricardo, Check the last version, also for the docker. It should work.

Cheers,

FErnando