IdentityPython / pyFF

SAML metadata aggregator
https://pyff.io/
Other
50 stars 37 forks source link

When filtering by entity type, I get a response that includes all entities #243

Closed alejandro-perez closed 1 year ago

alejandro-perez commented 1 year ago

When filtering by entity type, when I try to get on entity's JSON metadata, I get a response that includes all entities instead.

However, search works fine, and returns the matching entities after filter has been applied.

Code Version

2.0.0 (docker image)

Expected Behavior

Only the entity matching the SHA1 hash is returned.

Current Behavior

All entities are returned (~5k)

Possible Solution

Steps to Reproduce

  1. Use the following config
    - when update:
    - load:
        - http://metadata.ukfederation.org.uk/ukfederation-metadata.xml
    - break
    - when request:
    - select:
    - pipe:
        - when accept application/json:
            - select:
              - '!//md:EntityDescriptor[md:IDPSSODescriptor]'
            - discojson
            - emit application/json:
            - break
  2. Try to get an entity's JSON metadata (as expected by thiss-js)
    curl "http://localhost:8080/%7Bentities/%7Bsha1%7D573116c096bd85296da6c0fd921b9f36dc4c3805.json" -H "Accept: application/json"
  3. Get a full list of 5k entities
  4. By disabling the select filter in step 1, the results frm step 2 are just the expected ones:4
    [{"title": "UM - University of Murcia", "descr": "The Identity Provider of University of Murcia", "title_langs": {"es": "UM - Universidad de Murcia", "en": "UM - University of Murcia"}, "descr_langs": {"es": "El proveedor de identidad de la Universidad de Murcia", "en": "The Identity Provider of University of Murcia"}, "auth": "saml", "entity_id": "https://www.rediris.es/sir/umidp", "entityID": "https://www.rediris.es/sir/umidp", "type": "idp", "hidden": "false", "scope": "um.es", "domain": "um.es", "name_tag": "UM", "entity_icon_url": {"url": "https://img.sir2.rediris.es/200px-201a27c316f210f42657d783f2ae8fa0.png", "width": "200", "height": "53"}, "keywords": "um,murcia"}]
alejandro-perez commented 1 year ago

Logs from the instance with the "select" filter:

pyff_1   | [2023-03-13 12:35:04 +0000] [13] [DEBUG] GET /%7Bentities/%7Bsha1%7D573116c096bd85296da6c0fd921b9f36dc4c3805.json
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.api GET /%7Bentities/%7Bsha1%7D573116c096bd85296da6c0fd921b9f36dc4c3805.json HTTP/1.1
pyff_1   | Accept: application/json
pyff_1   | Content-Length: 0
pyff_1   | Host: localhost:8080
pyff_1   | User-Agent: curl/7.81.0
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.api match=None
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.api handling entry=request, alias={entities, path={sha1}573116c096bd85296da6c0fd921b9f36dc4c3805.json
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'when update': [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break']}, {'when request': [{'select': None}, {'pipe': [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]}]}]: calling 'when' using args: [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break'] and opts: ['update']
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'when update': [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break']}, {'when request': [{'select': None}, {'pipe': [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]}]}]: calling 'when' using args: [{'select': None}, {'pipe': [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]}] and opts: ['request']
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'select': None}, {'pipe': [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]}]: calling 'select' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:04 INFO pyff.builtins selecting using args: ['{sha1}573116c096bd85296da6c0fd921b9f36dc4c3805']
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.store calling store lookup {sha1}573116c096bd85296da6c0fd921b9f36dc4c3805
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.samlmd selecting 1 entities before validation
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.samlmd Filtering invalids from mdx
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'select': None}, {'pipe': [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]}]: calling 'pipe' using args: [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}] and opts: []
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'when accept application/json': [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']}]: calling 'when' using args: [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break'] and opts: ['accept', 'application/json']
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.pipes [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']: calling 'select' using args: ['!//md:EntityDescriptor[md:IDPSSODescriptor]'] and opts: []
pyff_1   | 2023-03-13 12:35:04 INFO pyff.builtins selecting using args: ['!//md:EntityDescriptor[md:IDPSSODescriptor]']
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.store calling store lookup entities
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.store filtering 9399 entities using xpath //md:EntityDescriptor[md:IDPSSODescriptor]
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.samlmd selecting 9399 entities before validation
pyff_1   | 2023-03-13 12:35:04 DEBUG pyff.samlmd Filtering invalids from dummy
pyff_1   | 2023-03-13 12:35:05 DEBUG pyff.store got 5283 entities after filtering
pyff_1   | 2023-03-13 12:35:05 DEBUG pyff.samlmd selecting 5283 entities before validation
pyff_1   | 2023-03-13 12:35:05 DEBUG pyff.samlmd Filtering invalids from mdx
pyff_1   | 2023-03-13 12:35:06 DEBUG pyff.pipes [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']: calling 'discojson' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:06 DEBUG pyff.pipes [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']: calling 'emit' using args: None and opts: ['application/json']
pyff_1   | 2023-03-13 12:35:06 DEBUG pyff.pipes [{'select': ['!//md:EntityDescriptor[md:IDPSSODescriptor]']}, 'discojson', {'emit application/json': None}, 'break']: calling 'break' using args: None and opts: []
[.........]

Log output for the instance without the select filter:

pyff_1   | [2023-03-13 12:35:59 +0000] [12] [DEBUG] GET /%7Bentities/%7Bsha1%7D573116c096bd85296da6c0fd921b9f36dc4c3805.json
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.api GET /%7Bentities/%7Bsha1%7D573116c096bd85296da6c0fd921b9f36dc4c3805.json HTTP/1.1
pyff_1   | Accept: application/json
pyff_1   | Content-Length: 0
pyff_1   | Host: localhost:8080
pyff_1   | User-Agent: curl/7.81.0
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.api match=None
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.api handling entry=request, alias={entities, path={sha1}573116c096bd85296da6c0fd921b9f36dc4c3805.json
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'when update': [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break']}, {'when request': [{'select': None}, {'pipe': [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]}]}]: calling 'when' using args: [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break'] and opts: ['update']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'when update': [{'load': ['http://metadata.ukfederation.org.uk/ukfederation-metadata.xml']}, 'break']}, {'when request': [{'select': None}, {'pipe': [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]}]}]: calling 'when' using args: [{'select': None}, {'pipe': [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]}] and opts: ['request']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, {'pipe': [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]}]: calling 'select' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:59 INFO pyff.builtins selecting using args: ['{sha1}573116c096bd85296da6c0fd921b9f36dc4c3805']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.store calling store lookup {sha1}573116c096bd85296da6c0fd921b9f36dc4c3805
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.samlmd selecting 1 entities before validation
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.samlmd Filtering invalids from mdx
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, {'pipe': [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]}]: calling 'pipe' using args: [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}] and opts: []
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'when accept application/json': [{'select': None}, 'discojson', {'emit application/json': None}, 'break']}]: calling 'when' using args: [{'select': None}, 'discojson', {'emit application/json': None}, 'break'] and opts: ['accept', 'application/json']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, 'discojson', {'emit application/json': None}, 'break']: calling 'select' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:59 INFO pyff.builtins selecting using args: ['{sha1}573116c096bd85296da6c0fd921b9f36dc4c3805']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.store calling store lookup {sha1}573116c096bd85296da6c0fd921b9f36dc4c3805
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.samlmd selecting 1 entities before validation
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.samlmd Filtering invalids from mdx
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, 'discojson', {'emit application/json': None}, 'break']: calling 'discojson' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, 'discojson', {'emit application/json': None}, 'break']: calling 'emit' using args: None and opts: ['application/json']
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.pipes [{'select': None}, 'discojson', {'emit application/json': None}, 'break']: calling 'break' using args: None and opts: []
pyff_1   | 2023-03-13 12:35:59 DEBUG pyff.api b'[{"title": "UM - University of Murcia", "descr": "The Identity Provider of University of Murcia", "title_langs": {"es": "UM - Universidad de Murcia", "en": "UM - University of Murcia"}, "descr_langs": {"es": "El proveedor de identidad de la Universidad de Murcia", "en": "The Identity Provider of University of Murcia"}, "auth": "saml", "entity_id": "https://www.rediris.es/sir/umidp", "entityID": "https://www.rediris.es/sir/umidp", "type": "idp", "hidden": "false", "scope": "um.es", "domain": "um.es", "name_tag": "UM", "entity_icon_url": {"url": "https://img.sir2.rediris.es/200px-201a27c316f210f42657d783f2ae8fa0.png", "width": "200", "height": "53"}, "keywords": "um,murcia"}]'
leifj commented 1 year ago

The implicit select is essentially "overwritten" by the explicit select in the when clause so returning all 5k entities is actually the correct behavior in that case. I think you are trying to filter the response to only returning IdPs yes? You should look into the filter directive for this usecase I think.

alejandro-perez commented 1 year ago

Thanks @leifj . Is there any way in which I can get the functionality I want, where searchs only return IDPs, while grabbing one entity's metadata can return any from the loaded feed? (this is how seamlessaccess.org works, but I couldn't find how to replicate).

Sorry I ask here, but I haven't seen anything in the docs or examples that shed any light on this.

alejandro-perez commented 1 year ago

To mimic the example above:

curl "https://md.seamlessaccess.org/entities/%7Bsha1%7Dd6cad1541a6653fa308955d7341b7171bc970f09.json" -H "Accept: application/json"
{"title":"DevTeam Test RPi Box","descr":"DevTeam's Test RPi Box","title_langs":{"en":"DevTeam Test RPi Box"},"descr_langs":{"en":"DevTeam's Test RPi Box"},"auth":"saml","entity_id":"https://rpi.dev.ukfederation.org.uk/shibboleth","entityID":"https://rpi.dev.ukfederation.org.uk/shibboleth","type":"sp","id":"{sha1}d6cad1541a6653fa308955d7341b7171bc970f09"}

But:

curl "https://md.seamlessaccess.org/entities/?query=rpi" -H "Accept: application/json" 
[{"title":"Rensselaer Polytechnic Institute","descr":"http://www.rpi.edu/","title_langs":{"en":"Rensselaer Polytechnic Institute"},"descr_langs":{"en":"http://www.rpi.edu/"},"auth":"saml","entity_id":"https://shib-idp.rpi.edu/idp/shibboleth","entityID":"https://shib-idp.rpi.edu/idp/shibboleth","type":"idp","hidden":"false","scope":"rpi.edu","domain":"rpi.edu","name_tag":"RPI","entity_icon_url":{"url":"https://scer.rpi.edu/sites/default/files/logo-without-tag.jpg","width":"673","height":"175"},"privacy_statement_url":"http://scer.rpi.edu/privacypolicy","id":"{sha1}549b3f2200a9f8c5d11808ff74931d68be4d32f8"}]
leifj commented 1 year ago

We don't filter in SA actually but you can achieve the effect you want by adding a filter step after the select - the API docs has an example but doing something like this should work:

- select:
- filter:
     - "!//md:EntityDescriptor[md:IDPSSODescriptor]"
alejandro-perez commented 1 year ago

From my example above against SA, you can see how results from the /entities/?query=XXX endpoint do filter, since otherwise DevTeam's Test RPi Box would have been included in the results.

In any case, I'll try using filter. Where can I find the API docs? https://pyff.readthedocs.io/en/latest/ do not mention anything about filter. Thanks!

alejandro-perez commented 1 year ago

Also, if I use filter, the SP details are not returned (cause, it's an SP). SA does not show this behaviour.

Using the following link, you can see how SP's details are fetched (top left corner shows display name), while still the search does not return any SP (only IDPs).

https://service.seamlessaccess.org/ds/?entityID=https%3A%2F%2Frpi.dev.ukfederation.org.uk%2Fshibboleth&return=https%3A%2F%2Falexrpi.ddns.net%3A9443%2FShibboleth.sso%2FLogin%3FSAMLDS%3D1%26target%3Dss%253Amem%253A65d9ecc802bd1a9cdc618855df00dd9a2e4302888cee6d3f502293f2b3cef70d

leifj commented 1 year ago

Search is not handled via pipelines in pyFF. The filter clause is documented in the source but maybe readthedocs isn't exposing all the API docs. I will look into that.

leifj commented 1 year ago

Also, if I use filter, the SP details are not returned (cause, it's an SP). SA does not show this behaviour.

As I said above, SA doesn't do filtering - both SPs and IdPs are included in the metadata feed from SA however not all SPs are included in SA because SA only looks at edugain and a few other federation feeds currently so your test box might not be included in one of those feeds. This could be the reason you're not seeing it.

alejandro-perez commented 1 year ago

As I said above, SA doesn't do filtering - both SPs and IdPs are included in the metadata feed from SA however not all SPs are included in SA because SA only looks at edugain and a few other federation feeds currently so your test box might not be included in one of those feeds. This could be the reason you're not seeing it.

I'm not sure about that. The name of the test box is rendered properly on the top-left corner (cause it's the requesting SP), but you cannot find it using the search.

Somehow, even though SA is including all entities (because you can see how the SP name is displayed correctly), it manages to filter SPs out of hte search results. Whether that is done at the pyFF level or thiss.io I do not know.

alejandro-perez commented 1 year ago

Search is not handled via pipelines in pyFF. The filter clause is documented in the source but maybe readthedocs isn't exposing all the API docs. I will look into that.

Thanks that'll be useful