Excluding certain fields when calculating conditions-hash

kamil-olszewski-uw commented 3 years ago

There are fields in the cooperation coditions that are not in the IIAs template. Those are:

sending-contact
receiving-contact
sending-ounit-id
receiving-ounit-id

Recently, there has been the idea that certain fields should be removed from the conditions-hash computations, because changing these fields would change the hash, but they do not contain information that should affect the validity of the agreement.

We believe that this should apply to fields sending-contact and receiving-contact, as the contact details of cooperation coordinators can indeed change.

However, we believe that the hash should take into account the field sending-ounit-id and receiving-ounit-id, because the faculties implementing cooperation conditions are unlikely to change. And if this happens (or the name of the organizational unit changes due to changes in the structure of the university), a new agreement should be signed in such a case.

Please comment if you think we should do it differently.

umesh-qs commented 3 years ago

From https://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/. I see the below example.

As a simple example of the type of problem that changes in XML context can cause for signatures, consider the following document:

   <n1:elem1 xmlns:n1="http://b.example">
       content
   </n1:elem1>
this is then enveloped in another document:

   <n0:pdu xmlns:n0="http://a.example">
      <n1:elem1 xmlns:n1="http://b.example">
          content
      </n1:elem1>
   </n0:pdu>
The first document above is in canonical form. But assume that document is enveloped as in the second case. The subdocument with elem1 as its apex node can be extracted from this second case with an XPath expression such as:

 (//. | //@* | //namespace::*)[ancestor-or-self::n1:elem1]
The result of applying Canonical XML to the resulting XPath node-set is the following (except for line wrapping to fit this document):

   <n1:elem1 xmlns:n0="http://a.example"
             xmlns:n1="http://b.example">
       content
   </n1:elem1>
Note that the n0 namespace has been included by Canonical XML because it includes namespace context. This change which would break a signature over elem1 based on the first version.

frangarcj commented 3 years ago

But for example, you are using ns40 as namespace name and maybe my XML manipulation library uses another (eg. ns59) so our hashes for the same version and conditions will be different.

We are using the following function for JAVA with the CooperationConditions class marked as XMLRootElement. The output is with the namespace but without a name.

private String calculateHash() {
        try {
            org.apache.xml.security.Init.init();
            JAXBContext contextObj = JAXBContext.newInstance(IiasGetResponse.Iia.CooperationConditions.class);
            Marshaller marshallerObj = contextObj.createMarshaller();
            StringWriter sw = new StringWriter();
            marshallerObj.marshal(cooperationConditions, sw);
            Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);
            byte[] canonXml = canon.canonicalize(sw.toString().getBytes());
            String canonString = new String(canonXml);
            log.info(canonString);

            final byte[] digest = MessageDigest.getInstance("SHA-256").digest(canonXml);            
            return Hex.encodeHexString(digest);
        } catch (Exception e) {
            log.error(e);
        }
        return null;
    }

umesh-qs commented 3 years ago

It does not matter what namespace prefix is used. You should be calculating the hash as per what data the partner has sent and match that hash. Purpose of hash validation is to make sure that the same hash is calculated by both parties, on the content that is shared in the IIA

frangarcj commented 3 years ago

But the hash is calculated over the string characters so if your string contains ns40 and mine doesn't, hashes will be different.

My string to be hashed

<cooperation-conditions xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v4/endpoints/get-response.xsd"><student-studies-mobility-spec><sending-hei-id>demo.usos.edu.pl</sending-hei-id><receiving-hei-id>ual.es</receiving-hei-id><receiving-academic-year-id>2020/2021</receiving-academic-year-id><receiving-academic-year-id>2021/2022</receiving-academic-year-id><receiving-academic-year-id>2022/2023</receiving-academic-year-id><receiving-academic-year-id>2023/2024</receiving-academic-year-id><receiving-academic-year-id>2024/2025</receiving-academic-year-id><receiving-academic-year-id>2025/2026</receiving-academic-year-id><receiving-academic-year-id>2026/2027</receiving-academic-year-id><receiving-academic-year-id>2027/2028</receiving-academic-year-id><mobilities-per-year>5</mobilities-per-year><recommended-language-skill><language>en</language><cefr-level>B2</cefr-level></recommended-language-skill><subject-area><isced-f-code>0410</isced-f-code></subject-area><total-months>6</total-months><blended>false</blended><eqf-level>6</eqf-level></student-studies-mobility-spec><student-studies-mobility-spec><sending-hei-id>ual.es</sending-hei-id><receiving-hei-id>demo.usos.edu.pl</receiving-hei-id><receiving-academic-year-id>2020/2021</receiving-academic-year-id><receiving-academic-year-id>2021/2022</receiving-academic-year-id><receiving-academic-year-id>2022/2023</receiving-academic-year-id><receiving-academic-year-id>2023/2024</receiving-academic-year-id><receiving-academic-year-id>2024/2025</receiving-academic-year-id><receiving-academic-year-id>2025/2026</receiving-academic-year-id><receiving-academic-year-id>2026/2027</receiving-academic-year-id><receiving-academic-year-id>2027/2028</receiving-academic-year-id><mobilities-per-year>4</mobilities-per-year><recommended-language-skill><language>en</language><cefr-level>B1</cefr-level></recommended-language-skill><subject-area><isced-f-code>0410</isced-f-code></subject-area><total-months>5</total-months><blended>false</blended><eqf-level>6</eqf-level></student-studies-mobility-spec></cooperation-conditions>

and its hash

f192b5470981b688147f94e836ac5cb0c0c7703ef4ee1bf1dadb56933a2d41f7

fmapeixoto commented 3 years ago

It does not matter what namespace prefix is used. You should be calculating the hash as per what data the partner has sent and match that hash. Purpose of hash validation is to make sure that the same hash is calculated by both parties, on the content that is shared in the IIA

I understand this sentence and it can be used for that, but I am not quite sure they shouldn't match. Assuming they shouldn't match, the Approvals API will expose the partner Hash as approved, right? And the partner should expose ours in order for the IIA to be approved?

frangarcj commented 3 years ago

But the hash utility is also used to validate the IIA from your partner. If the conditions of the IIA do not generate the hash given in the document, then it is malformed and we can't trust it.

So after we do a GET request of an IIA we must calculate the hash from the cooperation conditions and it must match with the one included in the document. Therefore, both partners must calculate the same hash from the same copy of the IIA.

umesh-qs commented 3 years ago

But the hash is calculated over the string characters so if your string contains ns40 and mine doesn't, hashes will be different.

My string to be hashed

<cooperation-conditions xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v4/endpoints/get-response.xsd"><student-studies-mobility-spec><sending-hei-id>demo.usos.edu.pl</sending-hei-id><receiving-hei-id>ual.es</receiving-hei-id><receiving-academic-year-id>2020/2021</receiving-academic-year-id><receiving-academic-year-id>2021/2022</receiving-academic-year-id><receiving-academic-year-id>2022/2023</receiving-academic-year-id><receiving-academic-year-id>2023/2024</receiving-academic-year-id><receiving-academic-year-id>2024/2025</receiving-academic-year-id><receiving-academic-year-id>2025/2026</receiving-academic-year-id><receiving-academic-year-id>2026/2027</receiving-academic-year-id><receiving-academic-year-id>2027/2028</receiving-academic-year-id><mobilities-per-year>5</mobilities-per-year><recommended-language-skill><language>en</language><cefr-level>B2</cefr-level></recommended-language-skill><subject-area><isced-f-code>0410</isced-f-code></subject-area><total-months>6</total-months><blended>false</blended><eqf-level>6</eqf-level></student-studies-mobility-spec><student-studies-mobility-spec><sending-hei-id>ual.es</sending-hei-id><receiving-hei-id>demo.usos.edu.pl</receiving-hei-id><receiving-academic-year-id>2020/2021</receiving-academic-year-id><receiving-academic-year-id>2021/2022</receiving-academic-year-id><receiving-academic-year-id>2022/2023</receiving-academic-year-id><receiving-academic-year-id>2023/2024</receiving-academic-year-id><receiving-academic-year-id>2024/2025</receiving-academic-year-id><receiving-academic-year-id>2025/2026</receiving-academic-year-id><receiving-academic-year-id>2026/2027</receiving-academic-year-id><receiving-academic-year-id>2027/2028</receiving-academic-year-id><mobilities-per-year>4</mobilities-per-year><recommended-language-skill><language>en</language><cefr-level>B1</cefr-level></recommended-language-skill><subject-area><isced-f-code>0410</isced-f-code></subject-area><total-months>5</total-months><blended>false</blended><eqf-level>6</eqf-level></student-studies-mobility-spec></cooperation-conditions>

and its hash

f192b5470981b688147f94e836ac5cb0c0c7703ef4ee1bf1dadb56933a2d41f7

Are you sure you are not manipulating the XML response when converting to class IiasGetResponse.Iia.CooperationConditions.class?

frangarcj commented 3 years ago

I am marshalling an object of that class to a c14n string so there's no manipulation. Just from Object to String.

We have been doing tests with the USOS platform and the University of Salamanca so we think we are doing it correctly. However, after seeing your XML with the namespace name I had to describe what we are doing for confirmation.

umesh-qs commented 3 years ago

I am marshalling an object of that class to a c14n string so there's no manipulation. Just from Object to String.

We have been doing tests with the USOS platform and the University of Salamanca so we think we are doing it correctly. However, after seeing your XML with the namespace name I had to describe what we are doing for confirmation.

So sw.toString() is exactly same as what is received from the partner?

umesh-qs commented 3 years ago

I am marshalling an object of that class to a c14n string so there's no manipulation. Just from Object to String.

We have been doing tests with the USOS platform and the University of Salamanca so we think we are doing it correctly. However, after seeing your XML with the namespace name I had to describe what we are doing for confirmation.

I don't work on Java. But looking at the code, I think you are converting XML string to a user-defined class and then that class back to a string. If the elements in XML response are not in the same sequence as in your class then it will not generate the XML back in the same sequence and may be name space prefix is also changed

pmarinelli commented 3 years ago

Dear All,

As far as we have understood the conditions hash, it has the main purpose to check if changes on an agreement have occurred.

We ask you what do you think about making the conditions hash reflect changes on data important from IRO member perspective, instead of changes on how such data is serialized.

That is, we think an IRO member is interested in knowing if the updated version of an agreement has changes on the number of mobilities, subject areas, and so on, and not in knowing if it is now served into a new version of the IIA API.

We think that having an algorithm as less coupled to technicalities as possible would improve the approval process.

Moreover, there is another aspect we would like to point your attention into. EWP allows to expose agreements in multiple versions at the same time. We think it is useful from an interoperability point of view. Do you think that it is appropriate for the IIA Approval API to cover such a use case, for instance by inserting the IIA API version along with the conditions hash? If different IIA API versions produce different conditions hashes, such an extra information would allow the IIA Approval API client know which API version the server has read the agreement from and thus which hash algorithm has to be used to check if it has approved the latest version of the agreement.

frangarcj commented 3 years ago

@umesh-qs

The XML string/request is first converted to java objects by the JAXB runtime. Other languages maybe can work directly over the received document.
To check the cooperation conditions, the elements (objects) are converted to a string, but in c14n form. That is, we are generating an XML string in a standard way that is different from the received document. Therefore, this string has no namespace names, comments or spaces, tabs, line returns. It keeps only the order of the elements.

I agree with @pmarinelli that having the namespace (version of the API) as part of the hash makes maintaining multiple APIs very difficult. If you call the IIA Approval you just receive a hash you don't know the version of.

mkurzydlowski commented 3 years ago

It does not matter what namespace prefix is used. You should be calculating the hash as per what data the partner has sent and match that hash. Purpose of hash validation is to make sure that the same hash is calculated by both parties, on the content that is shared in the IIA

I agree with @umesh-qs in both cases. This is precisely why we need to fix the way we currently calculate the hash.

The way presented by @frangarcj is the current way we calculate hash. It is, sadly, not the way C14N should be applied to the IIA response, as I understand it right now. Although it is an easier way to implement (at least for us) and, in this case, it might even produce the same hash for both partners (but this is not a requirement!).

You are right that the hash will change with every namespace change but it will probably change anyway because of changes to the schema.

Still for keeping approvals in a way that would work indefinitely one would need to use something like signed PDFs. That's a different discussion.

PS. @fmapeixoto, @georgschermann, how are you calculating the hash currently?

frangarcj commented 3 years ago

The way presented by @frangarcj is the current way we calculate hash. It is, sadly, not the way C14N should be applied to the IIA response, as I understand it right now. Although it is an easier way to implement (at least for us) and, in this case, it might even produce the same hash for both partners (but this is not a requirement!).

Why is not the way C14N should be applied? Which is the correct one?

Also, I need to check that the hash given by the partner is correct (if not the iia is malformed), then, why is calculating the same hash not a requirement?

mkurzydlowski commented 3 years ago

Why is not the way C14N should be applied? Which is the correct one?

As @umesh-qs was pointing it out, we are calculating a hash of a subdocument of the IIA get response. That's why we need to take into consideration the namespace prefixes used in that response. I wasn't fully aware of it at the beginning but it is my understanding after further reading of the spec. Still, you might prove me wrong after reading the spec.

Also, I need to check that the hash given by the partner is correct (if not the iia is malformed), then, why is calculating the same hash not a requirement?

What we want to check is if the hash the partner sends corresponds to the data he is hashing. Neither the hash nor the exact XML representation of the IIA served by us is required to match that of the partner's. I think @kamil-olszewski-uw might try to explain it further.

frangarcj commented 3 years ago

We are computing the exclusive c14n correctly, but it is true that there is only one namespace used (the default one) in the responses we are receiving.

<iias-get-response
    xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd"
    xmlns:c="https://github.com/erasmus-without-paper/ewp-specs-types-contact/tree/stable-v1"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
        https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd
        https://raw.githubusercontent.com/erasmus-without-paper/ewp-specs-api-iias/stable-v6/endpoints/get-response.xsd
    "
>
...

Using an exclusive c14n it just removes c namespace. However, it's true that nothing restricts the use of prefixes.

Neither the hash nor the exact XML representation of the IIA served by us is required to match that of the partner's

But I think that we need to add the requirement of having the same hash for both partners if the data is the same. Furthermore, it should not depend on the API version (only if the algorithm changes) or namespace prefixes.

mkurzydlowski commented 3 years ago

We are computing the exclusive c14n correctly, but it is true that there is only one namespace used (the default one) in the responses we are receiving.

That wasn't obvious from the way you are calculating the hash. If you generate the subdocument XML by marshaling only the subelement that there is no way to know for sure that the namespace(s) used in the subdocument have the same prefix names (or lack a prefix) as in the response XML.

But I think that we need to add the requirement of having the same hash for both partners if the data is the same.

We are not able to enforce this as the data obtained from both systems might have some differences but still may represent valid IIAs that are matching on a business level. @kamil-olszewski-uw will probably explain it better.

Furthermore, it should not depend on the API version (only if the algorithm changes) or namespace prefixes.

Technically it must change if we stick to this canonicalization algorithm. Also, as I pointed earlier, there is a high probability of fields changing in the hashed subdocument that might change the hash.

Still one might just keep the old API implementation for some time.

I'm still hoping that we might use a different mechanism for "signing" the IIA that would require the sign to live forever, even when the API changes.

kamil-olszewski-uw commented 3 years ago

What we want to check is if the hash the partner sends corresponds to the data he is hashing. Neither the hash nor the exact XML representation of the IIA served by us is required to match that of the partner's. I think @kamil-olszewski-uw might try to explain it further.

If we are going to approve the agreement, our system calls IIAs API get for us. Inside the get response, we get hash and cooperation conditions. Based on the just received cooperation conditions, we compute the hash on our side and check if both hashes match. If so, then we save partner's hash in our system and send the IIA Approval CNR. If the partner sends us an IIA Approval get request, we will send him back his hash in response.

The data of agreement stored in our system is not involved in any stage of this process.

georgschermann commented 3 years ago

PS. fmapeixoto, @georgschermann, how are you calculating the hash currently?

we calculate it nearly in the exact same way as frangarcj we only use it to check the received IIA not to compare it with our own data, since this will be represented differently most of the time.

j-be commented 3 years ago

I currently stumbled over this issue while implementing in at our side (Java with JAX-RS/JAX-B) and I too find the way this is specified highly unhandy. In detail, how I understand it (hard to tell how this would work on some framework out there, so please excuse any short-sightedness):

To be absolutely sure one would need to perform raw string operations on the raw XML received or to be sent by the API. Any library (even simple DOM parsers) may or may not mangle with it and introduce non-functional changes, like the above mentioned namespaces. They shouldn't, but who knows...
We need this to countercheck data within the same document. In other words: this is explicitly not meant to be a shared key, or any other means of retrieving related documents, or as a means of retrieving the document itself.

So I can't really see why it needs to be as complicated as it is, or what it should be used for. The things I can think of:

Notice if something changes: Any collision resistant random value would do (see OLA's changes-proposal.id)
Make sure the response was not tampered with: that is handled by signing stuff
Make sure the data is "correct": the sender will most probably compute the hash over the raw XML string it is about to send. So any fault in the implementation leading to "wrong" values will still result in valid hashes

In other words: I would propose to take one step back, and reevaluate what we are actually trying to achieve here, and if this could maybe be achieved in an easier way (e.g. like the nonce used in OLA's) rather than operations on raw strings and hashing that.

ctu-developers commented 3 years ago

Hello everyone.

I think the second part of the discussion in this ticket belongs to #47. We reported a problem with using the XML-C14N function to calculate conditional hashes more than a year ago, when we implemented IIAs V4. It's a long time. The documentation was poor. I agree with #53 we thought about it too. If I remember well...

For the process described here -- https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/48#issuecomment-815836903

What we want to check is if the hash the partner sends corresponds to the data he is hashing. Neither the hash nor the exact XML representation of the IIA served by us is required to match that of the partner's. I think @kamil-olszewski-uw might try to explain it further.

If we are going to approve the agreement, our system calls IIAs API get for us. Inside the get response, we get hash and cooperation conditions. Based on the just received cooperation conditions, we compute the hash on our side and check if both hashes match. If so, then we save partner's hash in our system and send the IIA Approval CNR. If the partner sends us an IIA Approval get request, we will send him back his hash in response.

The data of agreement stored in our system is not involved in any stage of this process.

This is true only in one case - both sides use the same prefixes like c, ewp, trd (from documentation). Using XML-C14N, we are not able to compute an identical hash from the partner data. This function does not guarantee an exact form for generating the identical hash. See XML-C14N specification -- https://www.w3.org/TR/xml-exc-c14n

What prevents me from parsing partner's data into my DOM tree and using my prefixes? Nothing. Another case. So if the partner uses other namespace names, the checksum will change without changing anything in the content. Doesn't the purpose of the "condition hash" to verify that the data hasn't changed?

Yes, the problem can be circumvented by going through the whole process - building a DOM, generating XML with my namespace and comparing the checksum with the version I sent in the previous round. So I am able to check that the partner in the document has not changed anything (except the namespace name). But it's a little annoying. It wouldn't matter yet, but somewhere only those checksums are sent without the content they "secure" and so the control depends only on my original data. That is the bigger problem, I think.

j-be commented 3 years ago

@ctu-developers I'm a bit confused now. As far as I understood, the condition-hash of the partners are not required to be the same in their respective representation of any given IIA. This is because the lists (e.g. student-studies-mobility-spec) are not required to have a deterministic ordering, only "consistent".

See the examples: A's XML has a different conditions-hash than B's XML. This is due to both listing the conditions where they themselves are sending-hei first, effectively reversing the list.

That is why I don't understand the strict requirement on the hash: I don't see any benefit in being able to check the hash on my side. Only thing I care about is that the one I am seeing now is the same as the one I saw before. And for that this procedure seems way overengineered to me. Any nonce (i.e. a simple version counter would do) is plenty. Or am I missing something?

georgschermann commented 3 years ago

you can see most of the discussion regarding the hash here #30. There has also been a discussion to use status flags or version numbers or the like. Hash was preferred by some partners,

j-be commented 3 years ago

@georgschermann I see, thanks for the link.

So is my understanding correct, that a hash was chosen, rather than a simple nonce due to lack of trust between HEIs? In other words: If I blindly trust PartnerHEI, I gain no benefit in checking the hash, right?

Follwoing that thought: is it true that the hash is not intended to be used for anything else but establishing trust, i.e. assuring an IIA does not change unilaterally and unnoticed after it has been approved by the partner? In particular, it is explicitly not a means of retrieving, linking, or referencing (except for Approval API) IIAs, right?

jiripetrzelka commented 3 years ago

What is the benefit of including namespaces in the hashed cooperation conditions? Aside from conforming to the W3C specification, it just breaks the hash whenever a new version of the API is released any thereby the namespace is changed, even if there has been no change in the part of the XML that is being hashed. Isn't this unnecessary?

georgschermann commented 3 years ago

The hash was chosen to be able to check/verify at a later time if the agreement is still the one which has been approved, since you could store the signed approval response. You are also always able to reproduce the same conditions you had previously even when namespaces etc. from the live API changed in the meantime. So you would always be able to prove, that your IIAs had been approved.

As a partner you would not recognize an IIA as approved by you when there has been a namespace / hash change. But you could also store the IIA GET response to prove that this was the version you had approved.

In our implementation we don't require/use the approval for anything, since we did our implementation before the approval API was introduced and we think of the signed-fields on the IIA as the more important information. But there are some partners which implemented the approval as an integral part of their processes, so we had to add several auto-approvals in our process to be able to exchange data with these partners.

The implementations of several partners differ greatly regarding the approval to cover their internal processes and it is still under discussion how to solve these issues which became visible during the last weeks and months.

If you don't rely on being able to prove an earlier approval in your software you can omit the whole thing and auto approve e.g. on import/signing/etc. or wait for the different partners to agree on a spec change e.g. towards a nonce / approval-uid / namespace-free hash calculation / etc.

sascoms commented 3 years ago

I think adding a namespace to the coop parts before hashing is only adding an extra job/task and it only makes the already complicated IIA process more complicated regarding there are many different scenarios, different implementations etc.

Whatever the decision is or will be, in my opinion, it should be (if not yet) decided very very soon whether we need to add a namespace or not and also be documented on the API specs.

As some of you know or already experience, this is causing IIA daha exchange problems between providers. (as some adds ns and some not).

muratyuceer commented 3 years ago

When i first read the document, I was thinking hashes between local copies of partners must be equal and I was thinking of comparing them to realize that we are equal before approve.

After read issue comments and learn XML-C14N calculation, my thoughts changed to;

When I need to check a hash from a partner I have to calculate the hash again from the partner raw xml response (partner "get" api result (because version or namespace prefix could be different)) and compare it. In this case I will not use this calculation to understand the difference but just to confirm that we are using the same algorithm and store it if you want. (Also if i skip that I don't know what to lose)

To understand the difference I always have to calculate a hash from the bind api response to my c# class and compare it from my local hash.

This is what i'm thinking of doing

j-be commented 3 years ago

@muratyuceer As far as I understood it, the respective hashes are not expected to match - ever. This can have a variety of reasons (i.e. pretty-printed XML vs. all in one line), the most obvious being, that the spec requires the ordering of the elements within the lists in cooperation-conditions to be "consistent", but not deterministic.

To better see what I mean: check out the 2 get responses in the example: Both A and B list the entry, where they themselves are "sending HEI" first, effectively reversing the list, and thus changing the hash.

So as far as I understood: the only way for me to countercheck the hash of a received IIA is to perform operations on the raw XML string, which depending on the framework (Java JAX-RS with JAX-B here) may become particularly tricky - especially the "remove contacts" part of it.

pmarinelli commented 3 years ago

@j-be, we use XPath to extract the node set to canonicalize. Of course the XPath expression is applied to the octect stream as received by the partner. The use of XPath sounds to me as the safest choice, as it is also referred to by the exc-C14n specs (https://www.w3.org/TR/xml-exc-c14n/), which states: "The exclusive canonical form of a document subset is a physical representation of the XPath node-set, as an octet sequence, produced by the method described in this specification."

j-be commented 3 years ago

@pmarinelli thanks, seems like I need to look at XPath expressions (I always avoided them as I really dislike the concept of workling on raw XML) :smile: .

Anyway, after thinking about this a bit more, I think I am more confused than ever. Take the following (as far as I can tell XML-C14N canonical) XML (some stuff omitted for brevity):

<cooperation-conditions>
  <student-studies-mobility-spec>
    <sending-hei-id>tuwien.ac.at</sending-hei-id>
    <sending-contact>
      <contact-name>Doesn't really matter</contact-name>
    </sending-contact>
  </student-studies-mobility-spec>
</cooperation-conditions>

After excluding sending-contact and receiving-contact subelements this leaves me with (as far as I can tell still XML-C14N canonical):

<cooperation-conditions>
  <student-studies-mobility-spec>
    <sending-hei-id>tuwien.ac.at</sending-hei-id>

  </student-studies-mobility-spec>
</cooperation-conditions>

Note, that there are four (aka. spaces) in the apparently empty forth line.

In other words: I have to effectively remove the following regex from the XML (assuming no comments here, else it gets really tricky): <sending-contact>.*</sending-contact>, but I must not touch any white-space character preceding or following that part. Is that correct?

pmarinelli commented 3 years ago

@j-be Yes, it is correct (at least for me). The ewp specs require us to remove certain elements, not the whitespace characters possibly surrounding them.

sascoms commented 3 years ago

We are talking on how to hash and how to check hashes etc.

Is it only me who thinks this is rather very complicated and needs simplification or a better solution?

The missing partner IIA IDs, the hash calculation, hash checks, approvals, problems caused by master-master principle, duplicates, etc.

Why do we all need to find many turnarounds, creating magic tricks to fix these gaps, problems, issues?

Don't we need a new version in which IIA workflow is redesigned especially when now more developers and providers are involved and all have experienced the current problems during implementation and data exchange?

j-be commented 3 years ago

@sascoms I mostly agree, though I think a lot of this could be fixed by adding:

a deterministic and unique way to calculate the hash (i.e. any 2 copies of an IIA hash to the same value), so the hash can be used as a key, or even an ID, and thus
allow to search for said hash

This would solve:

complicated hashing, as it could be moved to the actual content, rather than the representation thereof (aka. XML)
finding the respective IIA at the partner HEI
approval is implicit if both HEIs have the same hash

It would even allow for changes in the schema without invalidating everything, as a second, or third hash can be added (though this may not scale beyond a couple of versions).

This could be achieved by adding:

deterministic sorting to the list
a hashing scheme, that for the same content (as in "what we agreed on") always returns the same hash

pmarinelli commented 3 years ago

@j-be just for sake of clarity.The xpath expression we use excludes by itself the sending and receiving contacts.That is, the node set selected by the xpath expression is ready to be canonicalized, without any further processing. I feel confortable with such an approach as it delegates the handling of all the xml peculiarities to an engine that is specifically designed to deal with them.

jiripetrzelka commented 3 years ago

During the implementation of the conditions-hash, we have stumbled upon the  entity in the Mobility Online implementation. Since the conditions-hash did not match, I tried to replace the entity by the newline character and voila the conditions hash started to match. But now I don't know if we are really supposed to convert entities into real characters and only then compute the hash, or if this is a glitch on the part of Mobility Online and the conditions-hash should be computed directly from the entity. Can someone please clarify?

jiripetrzelka commented 3 years ago

Would it be possible to add the requirement to the specification that the string being canonized mustn't contain any white spaces between elements so that we are not forced to do raw string operations on the incoming XML to preserve every space and tab in the string, as j-be pointed out in https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/48#issuecomment-881471580 ?

The Canonical XML Version 2.0 does not imply that spaces be removed: https://www.w3.org/TR/xml-c14n2/#sec-Requirements-Robustness

If I understand it correctly it means that if the partner adds any specific indentation to the XML then we cannot use any XML library for processing unless it is able to preserve the exact formatting, which I doubt any library can do.

janinamincer-daszkiewicz commented 3 years ago

For me it looks that even if some development teams have somehow solved the problem of hash calculation and can exchange IIAs betwen different own installations the problem is not yet solved on a global level and we still try to cope by tightining the requirements expressed in the specification. Which will help in only a short term.

What if we took another approach?

The role of hash is to give a partner whose IIA is being approved some evidence that we approve what has been sent to us in the last IIA get. The advantage of such proof is that hash is a relatively short string, easy to store and compare. Unfortunately it is hard to calculate (in a unified way on a global level). Also it it more difficult to use it in court as we need an accompying element showing that exactly this hash has been sent to us by the partner. Showing system logs in court may not be a good idea.

There is another option. Let's use a whole XML element with IIA (object, document) as such proof. Yes, as a string it is longer than hash but nowadays who cares (still much shorter than PDF equivalent). You may compare it as a string and also you can compare it as an object, with an internal structure, element by element. It is also much more transparent than the hash of this XML element. In particular if strings do not match you can easily find out why, which subelement has changed. May be only contact details and you do not care and will not ask for another round of approvals?

There is also extra option. You can add signature. In fact there are three options and the decision is more technical than business oriented: use document with embedded signature, or signature with embedded document, or document and signature as two separate elements. Signature is a much better court proof than hash (easier to handle). You can show it to the end user in the interface and quite possible that the IRO staff will prefer it over hash.

Summary Now in IIA get we obtain IIA in XML format and its hash. Let's eliminate hash. In IIA approval, instead of sending the obtained hash we would send the obtained IIA in XML format. The option to consider would be to add signature. Hash calculation would be eliminated from the process.

Tell us your opinion.

j-be commented 3 years ago

My two cents: As it is right now, we dicided not to check the partners hash, as falsifying it would need susbtantial criminal intent, which we trust pur partners don't have. We interpret it as a nonce an candidate key for approval, nothing more.

georgschermann commented 3 years ago

Currently I don't have an opinion on this, just wanted to mention, that signatures are already present in all requests and responses due to httpsignature authentication and the fact that tls server auth has been dropped by most providers.

@j-be same here

umesh-qs commented 3 years ago

I am not sure what purpose is proposed signature method serving instead of hash is serving except for the new legal angle that is being brought into. Even then I am not sure how it can be used in courts. If some more details can be provided on this from the EUF perspective, then it will help.

The current problem with calculating hash with its definition. If we can agree on a common format, like creating a single line string without any spaces and new line characters between tags and removing all comments, namespaces and prefixes, then the hash will work fine.

Also, most of the developers here can live/are living without the hash functionality. So better drop the hash comparison in case we cannot come up with a common format

BavoNootaert commented 3 years ago

I agree with @georgschermann and @umesh-qs: we already have signatures in the http headers, and hashes should work fine if we all just follow clear specifications. E.g.. compute the hash using the canonicalization on the data that is actually sent. Using XML Signatures involves canonicalization, so it could lead to similar problems. Moreover, the XML Signature specification is more complex, contains 'recommendations' and 'should's, and warnings like this:

Alternatives to the REQUIRED canonicalization algorithms (section 6.5), such as Canonical XML with Comments (section 6.5.1) or a minimal canonicalization (such as CRLF and charset normalization) , may be explicitly specified but are NOT REQUIRED. Consequently, their use may not interoperate with other applications that do not support the specified algorithm

So the problems may be even worse.

mkurzydlowski commented 3 years ago

@BavoNootaert, the way XML signature is prepared doesn't have to be strict as it won't be recalculated by the partner (as it has place with hashes currently).

The XML signature is not intended to "replace" hash computation, it's solely to be a prove and a good one.

What was suggested as a remedy for hash computation is just storing the whole XML, rather than a hash.

BavoNootaert commented 3 years ago

How can you be sure it's a proof if you don't check it is correct first?

As for storing the whole XML, I think one of the problems with computing the hash is that some frameworks make it hard to access the orginal request or do something with the response after hashing. So it cannot be assumed the XML that will be included in the approval is that what was sent by the partner. It may even be restructured to match the database structure of the approving partner (fields truncated or removed, reordered, whitespace added...). In that case it will not be an easy comparison at all. Checking it would require user intervention whereas checking hashes is done automatically. That is a huge change.

mkurzydlowski commented 3 years ago

What do you mean by "correct"?

The verification of a signature will be handled by the library.
The XML should correspond to the partner response.

The XML doesn't need to correspond byte by byte to a server response. It needs to represent the same data. Comparing two XML representations should also be handled by libraries (XMLunit for example but this was just the first search result).

BavoNootaert commented 3 years ago

Of course one would use libraries as much as possible. The point is: the hash is effectively a reference to a response received earlier (in some of the comments above it is treated as a nonce.) That is relatively straightforward.

The entire XML is harder:

Extracting the response as it was received by the partner is awkward with many frameworks, as it is parsed to some object before it is handled by the application itself. So if you include it in the approval response, you have the same awkwardness as with hashes. That part isn't solved.
Upon receiving the approval response, you have to compare that xml with you own data. Instead of a simple string comparison, you now have to resort to libraries to help you. For which you still have to specifiy what is a significant difference and what is not. (If that is at all possible, given that partners may store the same IIA differently.) So you have made things more complex.

I think the legal argument (that it is a better proof) should be moved to a different issue, as it solves a different problem, and leads to other questions.

demilatof commented 2 years ago

If some one is interested, I tried here https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/72#issue-1069661511 to explain a different approach to the problem; this approach can produce two possible solutions:

1) sharing the same algorithm or exposing the exact string on a single line to compute for hash, between tags <toHash> and </toHash> 2) indicating that there is no need to compute again the hash received, just saving it together with IIA_id, and writing it in the approval. Every developer will choose how to bind iia_id, hash and data: storing XML or single fields.

mkurzydlowski commented 2 years ago

Returning to the issue of emulating signature with a hash.

For HEI A to be able to acquire a "signature" on A's IIA it currently needs to:

Store B's IIA approval of A's IIA. Otherwise if B looses, changes or can't temporarily serve the IIA approval, then A looses its signature.

What's more A has to store B's IIA approval in a way that can be used as a proof of B's will. Currently B would need to implement HTTP signature and A would need to store the data being transmitted. Also B's server key needs to be kept. This key needs to emulate the signing party even if an actual person is a better fit in this scenario.

Obtain and store the exact format of it's own IIA being transmitted to B. This can be cumbersome for many implementers and strictly depends on the data not being changed between B's request for A's IIA and B's approval.

For a "signature" to be emulated we need both IIA XML and "signed" IIA approval XML, and also be sure that hashes are equal.

It should be noted that such "signature" would not be subject to IIA data and version changes, as both IIA and approval would be stored based on the moment of signing.

It should also be noted that what has been described above is in fact a simulation of XML signature but a very cumbersome and error prone one. For an XML signature to take place:

B has to take A's IIA XML and sign it (all of it - no XML operations needed) by an XML signature library.
A has to store this signed XML.

Such signed XML is in fact a well defined proof and it additionally keeps the exact information about the signing party.

janinamincer-daszkiewicz commented 2 years ago

Some changes in the IIA and IIA Approval APIs will take place by the end of 2022. It makes sense to group them to make the change in the major number of the APIs once.

Have a look at arguments listed in https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/48#issuecomment-1004723182.

During the Infrastructure Forum meeting on 2022-11-16 the providers will vote if we want to replace hash with XML signature as proposed by UWarsaw.

demilatof commented 2 years ago

My bad, I don't completely understand the advantage of XML signature respects the hash code, whilst it could offer similar issues in hash computation. Since the specifications give importance to the cooperation conditions hash code, I thought that there was a reason: "signing" the essential part of an IIA and allowing changes elsewhere.

If introducing XML signature is for saving a copy of the XML signed by the partner (as a proof), the same could be achieved saving our XML and the Approval Response where there is the hash code. To be noticed that when B takes A's IIA XML, B could even modify the XML before signing it (intentionally or because of a library or an intermediary). Therefore, when A downloads the signed copy, it has to check that B has signed what A previously sent. We have to take care of all of these issues, keeping in mind that the two documents could contain the same information, but an empty space or a carriage return would make them different, breaking the XML signature.

erasmus-without-paper / ewp-specs-api-iias

Excluding certain fields when calculating conditions-hash #48