Microservices: Unexpected behaviour of Hasher & HashProcessor

Code Version

Satosa 8.2.0 (docker), running an saml2 frontend and oidc backend, proxy from OIDC to SAML

Expected Behavior

(1) Hasher and HashProcessor produce the same hash values (under the same input circumstances)
(2) Hasher and HashProcessor can produce a hash value of attribute values which came from primary identifier processor

Current Behavior

(1) Hasher and HashProcessor, e.g.

same input string, e.g.: 2b27510c-6297-311b-7004-ab7113390135
same salt, e.g.: qiAUwGorlHrKjbWSB68ZIelO7KC6VrZQ8wpoEVdzPb1D9ZhD
same hash alg, e.g.: sha224 but: Hasher and HashProcessor, both configured to hash that attribute containing the input string value, produce different output values (of course they were not used at the same time but tested separately) [this could be due to internals, e.g. prepending vs. appending salt, using different libs etc.]

(2) Trying to hash an attribute, which was scoped and generated by the processor "primary identifier", return unexpected results:

(2.1) HashProcessor: fails with type error

satosa|   File "/usr/local/lib/python3.11/site-packages/satosa/micro_services/attribute_processor.py", line 60, in process
satosa|     instance.process(data, attribute, **kwargs)
satosa|   File "/usr/local/lib/python3.11/site-packages/satosa/micro_services/processors/hash_processor.py", line 31, in process
satosa|     attributes[attribute][0] = value_hashed
satosa|     ~~~~~~~~~~~~~~~~~~~~~^^^
satosa| TypeError: 'str' object does not support item assignment

(2.2) Hasher: produces multiple hashed strings as output values, seperated with a semicolon, some of those are identical, some differ

Possible Solutions

(1) Align Hasher / Hashprocessor outputs with preserved backwards compatibility, e.g. by providing a new parameter to align hash functions (rewrite/align/split internal hashing processes)

(2) There might be a fix for the "primary identifier" processor (which might be the root cause of that issue) - otherwise a fix for Hasher and HashProcessor could be provided

Workarounds

Of course own / modified / custom modules could be written. But usually I try to find a way with default functions.

(1) Use either or, not both. The issue is that Attribute Processor > HashProcessor cannot update/modify subject_id. In the microservice "hasher" the subject_id can be hashed as well among with the attributes. The HashProcessor seem only to be able to hash attributes from the internal map (I might be wrong here and would love to get feedback on that matter).

(2) Do not hash the target attribut produced from "primary identifier" with Hasher microservice but with AtributeProcessor > HashProcessor. There you could use "ScopeProcessor" appending an arbitrary string before hashing with the HashProcessor. This is a workaround as HashProcessor would fail otherwise.

Steps to Reproduce

(1)

Hash an input string with hasher and hashprocessor, using the same input string, salt and algorithm, compare outputs

(2)

Use microservice "primary identifier" with add_scope "issuer_entityid", e.g. producing '2b27510c-6297-311b-7004-ab7113390135https://host/oidc/auth', e.g. let the micrososervice place the generated identifier in attribute 'uid'
Then use Hasher or HashProcessor to hash the 'uid' to see results mentioned above

Hi, I think there is an assumption that those two micro-services should work the same way, but that is not true.

The HashProcessor seem only to be able to hash attributes from the internal map (I might be wrong here and would love to get feedback on that matter).

The Attribute Processor is about processing attributes. The subject_id element corresponds to the SAML Subject/NameID element value, which is not an attribute.

Use either or, not both.

It depends on what you are trying to achieve. Each has its place and its own behaviour. If you want to combine them, you can also do that. It all depends on your use case. It is certain that they don't behave in the same way.

Thanks for having a look into this.

I'll dig a bit deeper:

This is also related to federational use cases with best practices for identifier migration and support in mind. Also I think this does not explain the mentioned errors related to processing the primary_identifier attribute, described in (2); (2.1); (2.2), this should not happen in the first place I guess.

Example use case (not exactly the same as the one I'm trying to achieve, but for relevance/illustration purposes):

Let's say SATOSA generates the SAML2 Name ID with Hasher from some input
Let's say SPs uses / used the persistent-id / SAML2 Name ID or [eduPersonTargetedID] (https://wiki.refeds.org/display/STAN/eduPerson+2020-01#eduPerson202001-eduPersonTargetedID) to identify a user (which are all deprecated)
A best practice, with migration purposes in mind, would be to use the same generated SAML2 Name ID and construct the SAML2 pairwise id attribute with it (e.g. name id + scope), which is a replacement for the previously mentioned identifiers

But:

This would require you to construct the pairwise id with the same output results from the SAML2 Name ID generated by Hasher in this case;
The value cannot be optained from an attribute hashed by Hasher (as hashing the attribute fails, see 2);
The hashing cannot be reproduced in attribute processor (as hash processor delivers a different output); If I use the same hash algorithm sha512 with the same salt at least I expect the functions to produce the same outputs, especially within satosa - but maybe thats just me; the errors described in (2) stop resolving this issue otherwise, e.g. not using attribute processor to hash but hasher itself for attributes

default limitations (without modifications / extensions / custom processors):

attribute processor cannot scope with issuer_entityid (primary identifier is used for that and for detecting the input attribute of choice)
SAML2 Name ID value is not available in attribute processor (neither as target nor as source)
hashing a generated primary_identifier attribute in Hasher fails (see 2, especially 2.2)

At least the second part (2) should be considered a bug I guess (if I did not do anything else wrong)

Identifier migration means that the value that used to be released to some attribute will now be released as the value of one or more additional attributes. Then, after some time, the old attribute may stop being released.

The easiest way to do this is to copy the value of one attribute to the other attributes that should have the same value.

The Hasher and the HashProcessor are different things. They work in a different way. You should not expect them to do the same thing. If they were identical we would not need both.

Having said this, (2) does sound problematic but it is a separate issye. By (2) I am referring to:

(2) Trying to hash an attribute, which was scoped and generated by the processor "primary identifier", return unexpected results:

It looks to me that the primary_identifier plugin has a bug here: https://github.com/IdentityPython/SATOSA/blob/754dcc2/src/satosa/micro_services/primary_identifier.py#L259

instead of setting the value directly, it should set it as the only element of a list.

-            data.attributes[primary_identifier] = primary_identifier_val
+            data.attributes[primary_identifier] = [primary_identifier_val]

Then, both issues mentioned with the Hasher and the HashProcessor should go away. Can you make this change locally and confirm?

Any update on this @github-canni ?

Thanks for getting back, and sorry for my late reply. I had to rebuild my prior setup as I worked around the issues experienced.

Regarding (2): Hashing a value from the "primary identifier" microservice

Then, both issues mentioned with the Hasher and the HashProcessor should go away. Can you make this change locally and confirm?

I tried it, and the fix seems to work.

Regarding (1): Comparing hash results from Hasher / HashProcessor

Identifier migration

Identifier migration means that the value that used to be released to some attribute will now be released as the value of one or more additional attributes. Then, after some time, the old attribute may stop being released. The easiest way to do this is to copy the value of one attribute to the other attributes that should have the same value.

In terms of "identifier" migration at SPs: As you stated correctly in a reply before, the SAML Subject/NameID element is not an attribute, so we cannot copy from/to this value with default microservices (exception: primary identifier can replace the internal "subject_id", which is not an attribute, and Hasher can hash it).

This consideration for identifier migrations was the origin of the experienced hashing issues:

I wanted satosa to deliver the same hash value: a) as the unique part of the NameID (= the internal subject_id) and b) as a value for a successor attribute (e.g. the attribute pairwise-id)

This is considered best practice to help SPs relying on the older NameID migrate to another identifier attribute seamlessly.

But: Hashing the subject_id with hasher and then trying to get the same hash value delivered as an attribute by reproducing the hashing process with HashProcessor using the same salt will fail as they produce different hash values.

Expectations I think most people will expect that hashing functions with the same algorithm and salt within (what they might consider the "same" application) "satosa" will produce the same result. I would suggest adding some documentation that both work differently and do not produce the same hash result, even under the same conditions (algorithm and salt)

Solutions (for documentation purposes): Because of your fix, the hashing might now be done completely in Hasher (also for the attribute produced by primary identifier, so that the same values become available). This was not possible before, due to the documented issue. Other solutions were mentioned in the initial comment.

Thanks for your work!

IdentityPython / SATOSA