The rda_a1_03d test gets stuck for the DOI 10.21950/JWIRUU using "oai-pmh" repo and endpoint "https://qtydataverse.fccn.pt/oai".
The problem is in this test, it tries to connect to qtydataverse.fccn.pt:80 (i.e, http) in order to validate the files from a spanish dataset, and since FCCN dataverse does not have http port open, it timeouts.
(...)
DEBUG:urllib3.connectionpool:https://edatos.consorciomadrono.es:443 "GET
/citation?persistentId=doi:10.21950/JWIRUU HTTP/1.1" 302 231
/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py:1045:
InsecureRequestWarning: Unverified HTTPS request is being made to host
'edatos.consorciomadrono.es'. Adding certificate verification is
strongly advised. See:
https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
DEBUG:urllib3.connectionpool:https://edatos.consorciomadrono.es:443 "GET
/dataset.xhtml?persistentId=doi:10.21950/JWIRUU HTTP/1.1" 200 27260
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1):
qtydataverse.fccn.pt:80
ERROR:root:HTTPConnectionPool(host='qtydataverse.fccn.pt', port=80): Max
retries exceeded with url:
/resources/txt/guiaEcienciaDatos.pdf;jsessionid=140ef0e99283ee82382274d17335
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object
at 0x7f272948ac10>: Failed to establish a new connection: [Errno 110]
Connection timed out'))
(...)
Checking evaluator.py, found this:
In the line 620
If I set the endpoint (self.oai_base), it removes the protocol prefix.
Then, in the lines 627-629
url = landing_url + f
if 'http' not in url:
url = "http://" + url
Applies http prefix and tries to do a request with it, and obviously it gets stuck and timeouts because not all dataverses (like FCCN) do have an http port open.
Solution:
My suggestion is to fix that with the following at lines 627-629:
url = landing_url + f
if 'http:' not in url and 'http:' in self.oai_base:
url = "http://" + url
elif 'https:' not in url and 'https:' in self.oai_base:
url = "https://" + url
This will check if provided oai endpoint has a prefix protocol, and applies the correct one for the url header request.
This fixes the issue.
Also, you need to assure that self.oai_base has a prefix protocol set, otherwise, this may not work.
The rda_a1_03d test gets stuck for the DOI 10.21950/JWIRUU using "oai-pmh" repo and endpoint "https://qtydataverse.fccn.pt/oai". The problem is in this test, it tries to connect to qtydataverse.fccn.pt:80 (i.e, http) in order to validate the files from a spanish dataset, and since FCCN dataverse does not have http port open, it timeouts.
Checking evaluator.py, found this: In the line 620
landing_url = urllib.parse.urlparse(self.oai_base).netloc
If I set the endpoint (self.oai_base), it removes the protocol prefix. Then, in the lines 627-629
Applies http prefix and tries to do a request with it, and obviously it gets stuck and timeouts because not all dataverses (like FCCN) do have an http port open.
Solution: My suggestion is to fix that with the following at lines 627-629:
This will check if provided oai endpoint has a prefix protocol, and applies the correct one for the url header request. This fixes the issue. Also, you need to assure that self.oai_base has a prefix protocol set, otherwise, this may not work.