SURFscz / SRAM-deploy

Deploy scripts for the SCZ
Apache License 2.0
5 stars 5 forks source link

pyff metadata job doesn't fail if upstream source is broken #506

Closed baszoetekouw closed 7 months ago

baszoetekouw commented 8 months ago

in /opt/metadata/idps_feed.fd we explicitly have

- load verify_tls fail_on_error True:
    - /opt/metadata/certs/surfconext.xrd
    - /opt/metadata-src

However, if upstream is broken, pyff will simply ignore that source and continue, instead of exiting, causing out metadata feed to be broken:

Mar 12 16:00:03 meta-af1 systemd[1]: Starting pyFF Metadata processing...
Mar 12 16:00:03 meta-af1 pyff-metadata[828054]: Processing '/opt/metadata/idps_feed.fd'
Mar 12 16:00:13 meta-af1 pyff-metadata[828056]: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f55244a1eb0>: Failed to establish a new connection: [Errno 111] Connection refused')': /signed/2023/idps-metadata.xml
Mar 12 16:00:14 meta-af1 pyff-metadata[828056]: WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f55244440a0>: Failed to establish a new connection: [Errno 111] Connection refused')': /signed/2023/idps-metadata.xml
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f5524444250>: Failed to establish a new connection: [Errno 111] Connection refused')': /signed/2023/idps-metadata.xml
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: WARNING:pyff.fetch:error fetching https://metadata.test.surfconext.nl/signed/2023/idps-metadata.xml
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: WARNING:pyff.fetch:HTTPSConnectionPool(host='metadata.test.surfconext.nl', port=443): Max retries exceeded with url: /signed/2023/idps-metadata.xml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f5524444d00>: Failed to establish a new connection: [Errno 111] Connection refused'))
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: ---
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: total size:     4
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: selected:       2
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]:           idps: 2
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]:            sps: 0
Mar 12 16:00:16 meta-af1 pyff-metadata[828056]: ---
Mar 12 16:00:16 meta-af1 pyff-metadata[828054]: Processing '/opt/metadata/edugain_feed.fd'
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]: ---
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]: total size:     5558
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]: selected:       5558
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]:           idps: 5558
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]:            sps: 0
Mar 12 16:00:35 meta-af1 pyff-metadata[828064]: ---
Mar 12 16:00:36 meta-af1 pyff-metadata[828054]: Processing '/opt/metadata/frontend_feed.fd'
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]: ---
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]: total size:     1
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]: selected:       1
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]:           idps: 1
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]:            sps: 0
Mar 12 16:00:37 meta-af1 pyff-metadata[828073]: ---
Mar 12 16:00:37 meta-af1 pyff-metadata[828054]: Processing '/opt/metadata/backend_feed.fd'
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]: ---
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]: total size:     1
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]: selected:       1
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]:           idps: 0
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]:            sps: 1
Mar 12 16:00:39 meta-af1 pyff-metadata[828077]: ---
Mar 12 16:00:39 meta-af1 pyff-metadata[828054]: Installing 'idps.xml'
Mar 12 16:00:39 meta-af1 pyff-metadata[828054]: Installing 'edugain.xml'
Mar 12 16:00:39 meta-af1 pyff-metadata[828054]: Installing 'proxy_idp.xml'
Mar 12 16:00:39 meta-af1 pyff-metadata[828054]: Installing 'proxy_sp.xml'

(note that 7 IdPs should have been written to idps.xml instead of 2)

baszoetekouw commented 8 months ago

Maybe this is a bug in pyff? It can read the xsd file correctly, of course, so maybe that's what the fail_on_error is checking?

baszoetekouw commented 8 months ago

Curiously, this same call failed on test:

Mar 12 15:00:13 app1-tf1 systemd[1]: pyff-metadata.service: Consumed 13.169s CPU time.
Mar 12 16:00:00 app1-tf1 systemd[1]: Starting pyFF Metadata processing...
Mar 12 16:00:00 app1-tf1 pyff-metadata[2051984]: Processing '/opt/metadata/idps_feed.fd'
Mar 12 16:00:11 app1-tf1 pyff-metadata[2051987]: WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fdc337c4190>, 'Connection to metadata.test.surfconext.nl timed out. (connect timeout=10)')': /signed/2023/idps-metadata.xml
Mar 12 16:00:12 app1-tf1 pyff-metadata[2051987]: WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc337c42e0>: Failed to establish a new connection: [Errno 111] Connection refused')': /signed/2023/idps-metadata.xml
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc337c4460>: Failed to establish a new connection: [Errno 111] Connection refused')': /signed/2023/idps-metadata.xml
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: WARNING:pyff.fetch:error fetching https://metadata.test.surfconext.nl/signed/2023/idps-metadata.xml
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: WARNING:pyff.fetch:HTTPSConnectionPool(host='metadata.test.surfconext.nl', port=443): Max retries exceeded with url: /signed/2023/idps-metadata.xml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc337c4f10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: ERROR:pyff.builtins:<string>:0:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element '{urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor': Missing child element(s). Expected is one of ( {urn:oasis:names:tc:SAML:2.0:metadata}Extensions, {urn:oasis:names:tc:SAML:2.0:metadata}EntityDescriptor, {urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor ).
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: ERROR:pyff.pipes:Got exception when loading/executing pipe: XML schema validation failed
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: Traceback (most recent call last):
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/builtins.py", line 534, in publish
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     validate_document(req.t)
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/utils.py", line 279, in validate_document
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     schema().assertValid(t)
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "src/lxml/etree.pyx", line 3643, in lxml.etree._Validator.assertValid
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: lxml.etree.DocumentInvalid: Element '{urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor': Missing child element(s). Expected is one of ( {urn:oasis:names:tc:SAML:2.0:metadata}Extensions, {urn:oasis:names:tc:SAML:2.0:metadata}EntityDescriptor, {urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor ).
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: During handling of the above exception, another exception occurred:
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: Traceback (most recent call last):
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/bin/pyff", line 8, in <module>
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     sys.exit(main())
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/md.py", line 33, in main
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     plumbing(p).process(md, state={'batch': True, 'stats': {}})
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/pipes.py", line 363, in process
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     return Plumbing.Request(
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/pipes.py", line 301, in process
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     return pl.iprocess(self)
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/pipes.py", line 333, in iprocess
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     raise ex
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/pipes.py", line 323, in iprocess
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     ot = pipefn(req, *opts)
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:   File "/opt/metadata/pyff-env/lib/python3.9/site-packages/pyff/builtins.py", line 537, in publish
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]:     raise PipeException("XML schema validation failed")
Mar 12 16:00:14 app1-tf1 pyff-metadata[2051987]: pyff.pipes.PipeException: XML schema validation failed
Mar 12 16:00:14 app1-tf1 systemd[1]: pyff-metadata.service: Main process exited, code=exited, status=1/FAILURE
Mar 12 16:00:14 app1-tf1 systemd[1]: pyff-metadata.service: Failed with result 'exit-code'.
Mar 12 16:00:14 app1-tf1 systemd[1]: Failed to start pyFF Metadata processing.
Mar 12 16:00:14 app1-tf1 systemd[1]: pyff-metadata.service: Consumed 1.052s CPU time.

But apparently for the wring reason (pyff was still trying to interpret the xml).

mrvanes commented 8 months ago

Can't reproduce on --container deploy of pyff locally?

baszoetekouw commented 8 months ago

At the moment the upstream server is back up again, so that's to be expected. Does it also fail correctly if the upstream server times out?

mrvanes commented 8 months ago

I extended the tests and still can't reproduce with ConnectTimeoutError nor NewConnectionError (refused) exceptions?

baszoetekouw commented 7 months ago

Ok, let's leave it for now, and see if this error ever resurfaces...
I want to get rid of pyff anyway, as it is extremely wasteful with memory; should be easily replaceable by a shell script.