IdentityPython / SATOSA

Proxy translating between different authentication protocols (SAML2, OpenID Connect and OAuth2)
https://idpy.org
Apache License 2.0
197 stars 122 forks source link

SAML frontend loads metadata on every request #79

Open bajnokk opened 7 years ago

bajnokk commented 7 years ago

With SaToSa SAML frontend configured with edugain metadata, every request takes several minutes to serve. It seems like the plugin is parsing the metadata on every request, because the delay is not noticable when running with smaller metadata and the gunicorn processes do almost nothing but memory operations (according to strace). Enabling or disabling signature checking doesn't change the request time considerably.

When the service is started, all workers (3) behave similarly: high CPU usage for a couple of minutes. This would be acceptable if the subsequent incoming requests would be served fast, but this is not the case.

If it is a configuration issue, please give a hint where to look at metadata caching options. Please let me know if you need any more information. I'm running version 3.4.2.

leifj commented 7 years ago

For inacademia we use mdq support in pysaml2 instead of locally cached metadata for this reason. This seems to work very well.

bajnokk commented 7 years ago

Even using mdq for every request is silly, IMHO for a production service some caching is probably necessary. Anyway I could continue if I could figure out how to make pysaml to use mdq, but I haven't found any docs/examples on the internet.

leifj commented 7 years ago

Its certainly not "silly". A good MDQ setup includes a cache layer so its just a network roundtrip away. In the InAcademia case it is likely that you'd get very little effect from moving the cache into the SATOSA process as opposed to having it in the MDQ server but if you really want a cache "close" to SATOSA you can run an MDQ instance on the loopback interface.

bajnokk commented 7 years ago

IMHO a good MDQ client could take advantage of cacheDuration and validUntil, but right, a local MDX server is also sufficient.

Anyway, reloading metadata on every request seems to be a conceptual weakness to me.

I've managed to make SaToSa perform the MDQ, but I could not get it to verify the signature. I'd wait for the answer from PySAML devs until I share it, I might do it wrong.

leifj commented 7 years ago

you may want to log an issue with pysaml2 - I don't think @rohe watches this project

jkakavas commented 7 years ago

This only happens when SAMLMirrorFrontend is used. What actually happens is that metadata are read again because the configuration is reloaded every time the SAMLMirrorFrontend has to handle an Authentication Request.

The reason is that the aptly named

def handle_authn_request

https://github.com/SUNET/SATOSA/blob/master/src/satosa/frontends/saml2.py#L453

calls _load_idp_dynamic_endpoints

def handle_authn_request(self, context, binding_in):
        """
        Loads approved endpoints dynamically
        See super class satosa.frontends.saml2.SAMLFrontend#handle_authn_request
        :type context: satosa.context.Context
        :type binding_in: str
        :rtype: satosa.response.Response
        """
        idp = self._load_idp_dynamic_endpoints(context)
        return self._handle_authn_request(context, binding_in, idp)

which contains a call of load() of the config object ( https://github.com/rohe/pysaml2/blob/master/src/saml2/config.py ) that will cause it to reload the config and parse the metadata into an internal structure on which to operate.

From a functionality perspective, the process of building a new Server object makes sense but the subsequent reload of the metadata is unnecessary. The existing self.idp Server object can be used and extended by _load_idp_dynamic_endpoints or the existing self.idp.metadata can be passed as a parameter to load() (which will require a pull request to be accepted in pysaml2 ) so that it won't need to load the metadata from the sources defined at the frontend configuration file.

leifj commented 7 years ago

also related to #90 which makes this issue even more silly and urgent

peppelinux commented 4 years ago

@bajnokk the answer is use MDQ for huge metadata stores. Please let us know for additional stuffs otherwise this Issue will be closed in weeks thank you