IdentityPython / pysaml2

Python implementation of SAML2
Apache License 2.0
560 stars 422 forks source link

Normalize Cache Interface #646

Open ambsw-technology opened 5 years ago

ambsw-technology commented 5 years ago

This package only offers in-mem and filesystem caches; neither work correctly in highly scalable systems. I (ignorantly) tried to give the client a Django cache and ran into issues. Unfortunately, the internal implementation has a proprietary interface e.g. set(self, name_id, entity_id, info, not_on_or_after=0). The presence of an entity_id (vs. a simple key-value) is not going to be supported by most (any?) standard caching interfaces.

I'd like to adjust the Cache class to support a more standard cache backend under-the-covers. The interface can be the same, but would be translated into plain k-v operations.

ambsw-technology commented 5 years ago

If anyone needs to work around this issue, I have some helper classes for Django's cache system here, but they could be repurposed to non-Django caching as well. Note that pysaml2 expects mutated objects (i.e. dict) to automatically save so the sync behavior is critical to proper operation.

peppelinux commented 5 years ago

Interesting, Where would you put your code? Which files Will be involved?

Is It ready for a pull request?

ambsw-technology commented 5 years ago

This is how my DjangoCache shim is used (in Django). Per the Django documentation, reference Cache implementations can be found in django.core.cache.backends. I did some poking around and it looks like most of them (DB excluded) don't depend on actual Django features so they could be used in non-Django projects.

I can PR the DjangoCache shim (possibly under another name) if you want to include it in the package. In a perfect world, this project's base cache implementation (and all the places that use it) would be rewritten to use a more standard interface. I haven't done enough digging to figure out if that's possible in a backwards-compatible way.

peppelinux commented 4 years ago

Hi @ambsw-technology I coded in oidc-op something that could be reused in pysaml2. I faced the native implementation of sso sessions in a way that they probably would be reused in other idpy projects.

With this workaround I can use the Django model, or any other storage if properly configured, as it would be a dictionary: https://github.com/peppelinux/django-oidc-op/blob/master/oidc_op/db_interfaces.py

in oidcendpoint we have a storage system based on an in memory strategy. It would works fine with a nosql engine like redis or mongodb, but it is a disaster with a rdbms.

If I'll have some more time I'll do the same for pysaml2 and satosa, take a look and tell us what you think.

ambsw-technology commented 4 years ago

How does this code handle the following situation?

Does key A get updated in the Cache? I ask because this pattern is used in pysaml. My DjangoCache implementation gets around it by tying into the pysaml2-proprietary sync call.

My approach also ties directly into Django's caching infrastructure so you should be able to choose any Cache implementation. DB is one option provided by Django, but isn't hard-coded into my approach.

peppelinux commented 4 years ago

Probably this could sounds better for you: https://github.com/knaperek/djangosaml2/blob/master/djangosaml2/cache.py

Bytheway I'm wondering about how could be simple to implement an abstraction layer to manage many kind of storage type. That's all

peppelinux commented 4 years ago

@c00kiemon5ter is there something that you would consider for a future refactor or should we consider this as a "wont fix" kind?

@ambsw-technology did you find some tip in djangosaml2 or do you have some proposal to build a PR on this?

peppelinux commented 4 years ago

@ambsw-technology probably that's the answer, but @ivan could help a lot in this if possible. https://github.com/IdentityPython/pysaml2/issues/522#issuecomment-687674080

ambsw-technology commented 4 years ago

My DjangoCache wrapper is distributed under the same license so you can incorporate it in this package if you want. As I mentioned, the better solution is to change the internal cache implementation to use a more typical get(k) and set(k, v) interface. However, this package would also need to change the way it uses the cache since it gets and mutates a dict without explicitly using set again (which would be required in most cache implementations).

peppelinux commented 4 years ago

Completely agree, this Is the PoC that introduced this behaviour in jwtconnect.io stack (oidc)

https://github.com/peppelinux/pyAbstractStorage

As you can see that's what you say

spaceone commented 3 years ago

A dict like interface would be nice.