Closed peppelinux closed 4 years ago
- I don't see any such performance when I run my own tests with edugain.fd but if you can provide me with your garr pipeline I can try to reproduce the behaviour you are seeing.
That's the configuration https://github.com/ConsortiumGARR/idem-tutorials/blob/master/idem-community/HOWTO-Shibboleth/Identity-Provider/HOWTO%20Setup%20a%20MDX%20Server%20and%20configure%20it%20in%20Shibboleth%20IdP.md#configure-pyff
The problem Is some random errors that exposes status 500 instead of metadata to the requesters. They also have their cache, so they Will return for hours a missing result. Coupled with this the First query takes 7,5 seconds, this represent a problem in a web context.
- Regardless, it might be better to produce a mirrored MDQ instance for large-scale production environment. This approach is described in the dockumentation on readthedocs.io
Got It, I'm using a static serve with nginx as also purposed for documentation here but closed.
This static serve doesn't work with shibIdP and requests with clear entities (urlencoded) in the URL Path, so i developer Django-MDQ
Skickat från min iPhone
7 feb. 2020 kl. 18:15 skrev Giuseppe De Marco notifications@github.com:
I don't see any such performance when I run my own tests with edugain.fd but if you can provide me with your garr pipeline I can try to reproduce the behaviour you are seeing. That's the configuration https://github.com/ConsortiumGARR/idem-tutorials/blob/master/idem-community/HOWTO-Shibboleth/Identity-Provider/HOWTO%20Setup%20a%20MDX%20Server%20and%20configure%20it%20in%20Shibboleth%20IdP.md#configure-pyff
The problem Is some random errore that exposes errore 500 instead of metadata to
If you are talking about 503 - that isn’t random at all. By design pyFF returns 503 (temporary unavailable) when it is loading because this is how you inform a proxy/client to come back later. Clients that cache 503s are not obeying the HTTP spec.
the requesters. They also have their cache, so they Will return for hours a missing result. Coupled with this the First query takes 7,5 seconds, this represent a problem in a web context.
Regardless, it might be better to produce a mirrored MDQ instance for large-scale production environment. This approach is described in the dockumentation on readthedocs.io Got It, I'm using a static serve with nginx as also purposed for documentation here but closed.
There are two supported methods for producing a static copy using pyFF.
This static serve doesn't work with shibIdP and requests with clear entities in the URL Path.
I don’t understand what this means. Can you provide an example?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Skickat från min iPhone
7 feb. 2020 kl. 18:15 skrev Giuseppe De Marco notifications@github.com:
I don't see any such performance when I run my own tests with edugain.fd but if you can provide me with your garr pipeline I can try to reproduce the behaviour you are seeing. That's the configuration https://github.com/ConsortiumGARR/idem-tutorials/blob/master/idem-community/HOWTO-Shibboleth/Identity-Provider/HOWTO%20Setup%20a%20MDX%20Server%20and%20configure%20it%20in%20Shibboleth%20IdP.md#configure-pyff
I notice that your instruction doesn’t mention running a local redis instance - the rediswhooshstore assumes a local redis on the default port. Is this running in your env?
First of all thank you for your time, really appreciate.
If you are talking about 503 - that isn’t random at all. By design pyFF returns 503 (temporary unavailable) when it is loading because this is how you inform a proxy/client to come back later. Clients that cache 503s are not obeying the HTTP spec.
Yes about 503 but yes-and-no about Shibboleth IdP behaviour. Once in production I cannot sindacate the design of Shibboleth, I just need something that works in all the case. If pyff have the already downloaded metadata it should give these, if metadata are available but a little bit older this does not mean that pyff should get a pause, the availability of the informations is the primary goal, nobody cares about what pyff is doing internally. If I deployed a service this must be highly available as many ways as possibile. During these 503 errors the SP cannot do authn requests to the IdP, like the SP didn't have been federated yet. This is a problem.
There are two supported methods for producing a static copy using pyFF.
Nothing can be faster then a NginX static serve, 36ms against more than seconds of pyff. This is a fact. I should recompile nginx with a lua or python module to reproduce what Django-MDQ is doing. It would be tecnically interesting. This would be needed because of pyff storage, it uses sha1 naming and not clear naming. I prefer sha1 too but young md draft 12 tell us something different and Shibboleth uses the clear naming. Bad news but carry on ...
I don’t understand what this means. Can you provide an example?
Asking for: https://coco.release-check.edugain.org/shibboleth
in a pysaml2 entity the mdq query is built this way:
http://mdq.url/entities/{sha1}79d76b6b9ff8536c15679b446aba05c1c529f44a
The clear naming, used in Shibboleth products by default, uses instead the clear names as these:
http://mdq.url/entities/https%3A%2F%2Fcoco.release-check.edugain.org%2Fshibboleth
I notice that your instruction doesn’t mention running a local redis instance - the rediswhooshstore assumes a local redis on the default port. Is this running in your env?
Yes mister, Redis is running in backgroud, otherwise pyffd stop working at the begin with a connection error to redis-server.
I suggest you (or me) talk to the shib folks about not cashing 503s because that is absolutely the wrong behavior and I have no doubt they would agree.
I don't think pyFF should be pausing before returning responses especially when it has access to valid metadata and I have not seen that behavior myself. I will investigate using your examples!
pyFF doesn't care how you serve a static copy - nginx, lighttpd, apache, all will work! I was talking about producing static filetrees from pyFF. There are two methods: either use the mirror-mdq.sh script in 'scripts' or use the new 'map' pipe which lets you loop over all your entities in batch mode. There is an example in 'examples/batch-mdq-loop.fd' in HEAD for using map to produce an offline signed tree along with json files for discovery.
Thx for your explanation of clear vs sha1 (incidentally I'd use the term urlencoded names but nv mind). If you use the 'map' batch method above you can easily accommodate both representations. The map-batch method gives you a bit more control as it allows you to select file extensions etc and can be used to produce links etc that make both sha1 and urlencoded names work equally.
Ok I Will, thank you for the ack. At last:
1) I used the git master branch 2) another random exception happens regarding lxmltree validation. I don't have terminal but I'll Copy It here when available. When this internal exception happens, well, pyff stopped to give us mdq results untill the next refresh 3) seen https://github.com/IdentityPython/pyFF/blob/master/examples/batch-mdq-loop.fd Consider as mandatory a comprensive documentation of this operating mode with a description of the pipeline statements in It. The official documentation Is the key
Ok I Will, thank you for the ack. At last:
- I used the git master branch
thx!
- another random exception happens regarding lxmltree validation. I don't have terminal but I'll Copy It here when available. When this internal exception happens, well, pyff stopped to give us mdq results untill the next refresh
That does sound like a bug. Looking forwar to more detail.
- seen https://github.com/IdentityPython/pyFF/blob/master/examples/batch-mdq-loop.fd Consider as mandatory a comprensive documentation of this operating mode with a description of the pipeline statements in It. The official documentation Is the key
Yep. It is still very new.
That's my example configuration https://github.com/peppelinux/Satosa-saml2saml/tree/master/pyFF_example
Well, first query take always 9seconds! ... going to test batch-mdq-loop...
Ok, I tried to use batch-mdq-loop.fd. That's my configuration
I run first pyff pipelines/batch-mdq-loop-edugain.fd
then
pyffd -p pyff.pid -f -a --dir=
pwd--loglevel=DEBUG -H 0.0.0.0 -P 8001 pipelines/batch-mdq-loop-edugain.fd
In the debug log I can see, on a mdq http request from shibidp the following 404 error
[08/Feb/2020:11:55:50] MDServer pfx=None, path={base64}aHR0cHM6Ly9jb2NvLnJlbGVhc2UtY2hlY2suZWR1Z2Fpbi5vcmcvc2hpYmJvbGV0aA==, content_type=application/xml
DEBUG:cherrypy.error.139934034605280:[08/Feb/2020:11:55:50] MDServer pfx=None, path={base64}aHR0cHM6Ly9jb2NvLnJlbGVhc2UtY2hlY2suZWR1Z2Fpbi5vcmcvc2hpYmJvbGV0aA==, content_type=application/xml
[...]
10.0.3.101 - - [08/Feb/2020:11:55:50] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 404 6025 "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
INFO:cherrypy.access.139934034605280:10.0.3.101 - - [08/Feb/2020:11:55:50] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 404 6025 "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
I resolved this 404 problem adding in the pipeline this
- select
- store:
directory: ./garr
so, that's the log
10.0.3.101 - - [08/Feb/2020:12:17:19] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 200 - "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
INFO:cherrypy.access.140163049642080:10.0.3.101 - - [08/Feb/2020:12:17:19] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 200 - "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
but the content is not a valid samlxml response, it's more an error ...
The randoms exceptions, are caused due to typo errors in the requested entity name.
makes this in the pyff logs
<string>:0:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element '{urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor': Missing child element(s). Expected is one of ( {http://www.w3.org/2000/09/xmldsig#}Signature, {urn:oasis:names:tc:SAML:2.0:metadata}Extensions, {urn:oasis:names:tc:SAML:2.0:metadata}EntityDescriptor, {urn:oasis:names:tc:SAML:2.0:metadata}EntitiesDescriptor ).
XML schema validation failed
and this in the Http Response to the client
<h1>500 Internal Server Error</h1>
The server has either erred or is incapable of performing the requested operation.<br/><br/>
That's the result with Django-MDQ, the first request from shibIdP and the second from pySAML2.
It's really easy to test, just
git clone https://github.com/UniversitaDellaCalabria/Django-MDQ.git
cd Django-MDQ
pip install -r requirements.txt
# configure the pyff store path and rename it to settingslocal.py
nano django_mdq/settingslocal.py.example
./manage.py runserver 0.0.0.0:8001
If you have time to take a look at this we could develop this kind of decoupled service. It works out-of-the-box. If you could give a working and safe example ready for production and with relevant performance I would admit that django-mdq is quite useless.
Thank you for your time e never give up :)
Batch mode is not for pyffd!
Batch mode is not for pyffd!
My excuse, i Just Need a mdq service with responses available in 2seconds. You're the master, I'll follow you, let me see
The problem is with your pipeline (garr.fd) which forces a reload every time the server starts. Try using the "when update" pattern from examples/mdq.fd. That way the update will be run in the update process only as needed (when you run with a persistent backing store) and you should not see any delay until your server can start to answer queries.
Thank you Leifj, ack it
Correct me if I'm wrong. For a production ready and safe environment with a particular attention to performances, we should adopt the followings:
Could it be the production stop application?
The problem is with your pipeline (garr.fd) which forces a reload every time the server starts. Try using the "when update" pattern from examples/mdq.fd. That way the update will be run in the update process only as needed (when you run with a persistent backing store) and you should not see any delay until your server can start to answer queries.
Using this command
gunicorn --reload --reload-extra-file pipelines/garr.fd --preload --bind 0.0.0.0:8001 -t 600 -e PYFF_PIPELINE=pipelines/garr.fd -e PYFF_STORE_CLASS=pyff.store:RedisWhooshStore -e PYFF_UPDATE_FREQUENCY=600 -e PYFF_PORT=8001 --threads 4 --worker-tmp-dir=/dev/shm --worker-class=gthread pyff.wsgi:app
With this pipeline:
- when update:
- load xrd garr-loaded.xrd:
- ./pipelines/garr.xrd
- select
- store:
directory: ./garr
- publish:
output: ./garr/garr-loaded.xml
- stats
- break
# MDX server
- when request:
- select
- pipe:
- when accept application/xml:
- xslt:
stylesheet: tidy.xsl
- first
- finalize:
cacheDuration: PT5H
validUntil: P10D
- sign:
key: certificates/private.key
cert: certificates/public.cert
- emit application/xml
- break
- when accept application/json:
- xslt:
stylesheet: discojson.xsl
- emit application/json:
- break
The returning metadata appears as they were not signed ... Even removing RedisWhoosh ...
org.opensaml.saml.metadata.resolver.filter.FilterException: Metadata root element was unsigned and signatures are required.
Using instead this command they were signed, and returns in 1.360s (good enough for this topic!)
pyffd -p pyff.pid -f -a --dir=`pwd` --loglevel=DEBUG -H 0.0.0.0 -P 8001 pipelines/garr.fd
I'd prefer to go in production with gunicorn, you know. I'm sure that could resolve this mistery about the missed signature with gunicorn.
Thank you Leifj, ack it
Correct me if I'm wrong. For a production ready and safe environment with a particular attention to performances, we should adopt the followings:
- A pyff batch that populates a shared storage/path, scheduled. It's only a downloader/validator no publish tasks.
There are several ways of doing this as I explained in an earlier comment: run a batch or run pyffd and export with mirror-mdq.sh. The latter I have in production producing md.swedenconnect.se for a couple of years now.
- One or many gunicorn daemons, hopefully in HA, that doesn't download and validate things, just publish the MDX service using the entities shared through the network storage.
Could it be the production stop application?
As I said ... I have both versions in various stages of production.
The problem is with your pipeline (garr.fd) which forces a reload every time the server starts. Try using the "when update" pattern from examples/mdq.fd. That way the update will be run in the update process only as needed (when you run with a persistent backing store) and you should not see any delay until your server can start to answer queries.
Using this command
gunicorn --reload --reload-extra-file pipelines/garr.fd --preload --bind 0.0.0.0:8001 -t 600 -e PYFF_PIPELINE=pipelines/garr.fd -e PYFF_STORE_CLASS=pyff.store:RedisWhooshStore -e PYFF_UPDATE_FREQUENCY=600 -e PYFF_PORT=8001 --threads 4 --worker-tmp-dir=/dev/shm --worker-class=gthread pyff.wsgi:app
With this pipeline:
- when update: - load xrd garr-loaded.xrd: - ./pipelines/garr.xrd - select - store: directory: ./garr - publish: output: ./garr/garr-loaded.xml - stats - break # MDX server - when request: - select - pipe: - when accept application/xml: - xslt: stylesheet: tidy.xsl - first - finalize: cacheDuration: PT5H validUntil: P10D - sign: key: certificates/private.key cert: certificates/public.cert - emit application/xml - break - when accept application/json: - xslt: stylesheet: discojson.xsl - emit application/json: - break
The returning metadata appears as they were not signed ... Even removing RedisWhoosh ...
org.opensaml.saml.metadata.resolver.filter.FilterException: Metadata root element was unsigned and signatures are required.
Using instead this command they were signed, and returns in 1.360s (good enough for this topic!)
pyffd -p pyff.pid -f -a --dir=`pwd` --loglevel=DEBUG -H 0.0.0.0 -P 8001 pipelines/garr.fd
I'd prefer to go in production with gunicorn, you know. I'm sure that could resolve this mistery about the missed signature with gunicorn.
Which metadata was not signed exactly? The garr-loaded.xml or the metadata you get by talking to the MDQ endpoint?
Which metadata was not signed exactly? The garr-loaded.xml or the metadata you get by talking to the MDQ endpoint?
The latter, the root element is not signed as ShibIdP told me. this only happens if I run with gunicorn, with pyffd no matters.
do you get signed results if you run examples/edugain-mdq.fd using pyffd or gunicorn (if you modify the key and cert files that is)?
edugain-mdq.fd
Sorry for the late. Using this pipeline
- when update:
- load:
- https://mds.edugain.org
- when request:
- select:
- pipe:
- when accept application/xml:
- first
- finalize:
cacheDuration: PT12H
validUntil: P10D
- sign:
key: certificates/private.key
cert: certificates/public.cert
- emit application/xml
- break
- when accept application/json:
- discojson
- emit application/json
- break
With this command
gunicorn --reload --reload-extra-file pipelines/edugain-mdq.fd --preload --bind 0.0.0.0:8001 -t 600 -e PYFF_PIPELINE=pipelines/edugain-mdq.fd -e PYFF_STORE_CLASS=pyff.store:RedisWhooshStore -e PYFF_UPDATE_FREQUENCY=600 -e PYFF_PORT=8001 --threads 4 --worker-tmp-dir=/dev/shm --worker-class=gthread pyff.wsgi:app --log-config debug.ini
Have the same result on ShibIdP
2020-02-13 09:49:25,090 - 127.0.0.1 - ERROR [org.opensaml.saml.metadata.resolver.impl.AbstractDynamicMetadataResolver:882] - Metadata Resolver FunctionDrivenDynamicHTTPMetadataResolver DynamicEntityMetadata: Metadata filtering problem processing new metadata
org.opensaml.saml.metadata.resolver.filter.FilterException: Metadata root element was unsigned and signatures are required.
at org.opensaml.saml.metadata.resolver.filter.impl.SignatureValidationFilter.filter(SignatureValidationFilter.java:242)
2020-02-13 09:49:25,092 - 127.0.0.1 - INFO [org.opensaml.saml.common.binding.impl.SAMLMetadataLookupHandler:129] - Message Handler: No metadata returned for https://coco.release-check.edugain.org/shibboleth in role {urn:oasis:names:tc:SAML:2.0:metadata}SPSSODescriptor with protocol urn:oasis:names:tc:SAML:2.0:protocol
With this command
/opt/shibboleth-idp/bin/mdquery.sh -e https://coco.release-check.edugain.org/shibboleth --saml2 -u http://localhost:8080/idp
Using instead pyffd
pyffd -p pyff.pid -f -a --dir=`pwd` --loglevel=DEBUG -H 0.0.0.0 -P 8001 pipelines/edugain-mdq.fd
Got his
10.0.3.101 - - [13/Feb/2020:10:53:59] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 503 812 "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
INFO:cherrypy.access.139986651399280:10.0.3.101 - - [13/Feb/2020:10:53:59] "GET /entities/https:%2F%2Fcoco.release-check.edugain.org%2Fshibboleth HTTP/1.1" 503 812 "" "ShibbolethIdp/3.4.6 OpenSAML/3.4.5"
[13/Feb/2020:10:54:41] TOOLS.CACHING Reading response from cache
Modifing the pipeline this way
- when update:
- load:
- https://mds.edugain.org
- select
- store:
directory: ./garr
- publish:
output: ./garr/garr-loaded.xml
- stats
- when request:
- select:
- pipe:
- when accept application/xml:
- first
- finalize:
cacheDuration: PT12H
validUntil: P10D
- sign:
key: certificates/private.key
cert: certificates/public.cert
- emit application/xml
- break
- when accept application/json:
- discojson
- emit application/json
- break
Get the metadata, signed, with pyffd. But not with gunicorn, hope this helps
I believe the issue is related to MIME types. Can you find a way to figure out which MIME-type is used when shibd is requesting metadata?
That's the shibIdP Http request headers
HTTP_ACCEPT 'application/samlmetadata+xml, application/xml, text/xml'
PATH_INFO /entities/https://coco.release-check.edugain.org/shibboleth
HTTP_USER_AGENT ShibbolethIdp/3.4.6 OpenSAML/3.4.5
The problem is that gunicorn doesn't return any content instead ... Checked with a stupid http request from a web browser.
Mystery that will be solved sooner or later
Skickat från min iPhone
15 feb. 2020 kl. 01:59 skrev Giuseppe De Marco notifications@github.com:
That's the shibIdP Http request headers
HTTP_ACCEPT 'application/samlmetadata+xml, application/xml, text/xml' PATH_INFO /entities/https://coco.release-check.edugain.org/shibboleth HTTP_USER_AGENT ShibbolethIdp/3.4.6 OpenSAML/3.4.5 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Thx! I will investigate. I’m pretty sure we are looking at an issue related to mime type negotiation
Ok, I got satisfaction from this thread and also from email with Scott and Rainer, so I think that we can close this issues if you agree.
I got the same performances that I have with django-mdq, also with pyffd. The Key was the 'when update' statements in the pipeline.
I have some lxc containers for tests (shib and slapd). I can share these with you to speedup tests. You have my email.
Problem
The first query on pyff takes roughly from 4 to 8 seconds.
Commands used
A Production example with RedisWhoosh
Using gunicorn
Consequences
I'll continue using pyff but as metadata downloader and validator. I cannot use this approach with Shibboleth entities, because standing on draft-young-md-query these are queried in a natural format, like
PySAML2 continue using the {sha1} format. Because of this I coded in few lines of code this: https://github.com/UniversitaDellaCalabria/Django-MDQ
It simply works, probably with much time I'll develop the aggregated entitities/ and the validUntil definition but actually I don't need them for a local MDQ for HA shibIdP replicas and other PySAML2 entities
Idea
Wouldn't it be better to decouple the MDQ service from pyff? Django-MDQ works on top of pyFF, more like a module.