IdentityPython / pyFF

SAML metadata aggregator
https://pyff.io/
Other
50 stars 37 forks source link

Filtering doesn't work as expected #229

Closed cfra closed 3 years ago

cfra commented 3 years ago

Code Version

1.1.5

Expected Behavior

I have a filter configured in my update pipeline.

I expect pyFF to apply the filter during the loading process, so only entities matching that filter should be listed when browsing metadata or querying via MDQ.

Current Behavior

All entities loaded from the source are shown.

Possible Solution

I have a really hard time grasping how the pipelines work. Some more documentation and examples would be really great.

Steps to Reproduce

Use this pipeline:

- when update:
    - load:
        - "http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-test-metadata.xml verify dfn-aai.pem"
    - select
    - filter: 
      - "https://testidp2.aai.dfn.de/idp/shibboleth"
    - break
- when request:
    - select
    - pipe:
        - when accept application/xml:
             - xslt:
                 stylesheet: tidy.xsl
             - first
             - finalize:
                cacheDuration: PT10D
                validUntil: PT5H
             - sign:
                 key: default.key
                 cert: default.crt
             - emit application/xml
             - break
        - when accept application/json:
             - xslt:
                 stylesheet: discojson.xsl
             - emit application/json
             - break

Browse pyFF's webinterface.

leifj commented 3 years ago

Right, this is the intended behavior if somewhat confusing for beginners. It is the the load pipe which persists data to the internal storage, not the whole update entrypoint so pipes after load (you can have multiple load statements) has no magic side-effect on the result of the load.

In order to achieve what you want you need to use the "load ... via" or "load ... cleanup" construction in order for the result of the filtering to get applied to the repository. For instance try this (abbreviated example):

- when onlyips:
    - filter:
        - "!//md:EntityDescriptor[md:IDPSSODescriptor]"
- when update:
    - load:
        - "http://www.aai.dfn.de/fileadmin/metadata/dfn-aai-test-metadata.xml verify dfn-aai.pem cleanup onlyidps"

The difference between via and cleanup is that cleanup happens before validation and via after.