Unable to compute identical "conditions-hash"

ctu-developers commented 4 years ago

We supposed that is not quite exact procedure for counting "conditions-hash".

If you name the namespace differently (not c, ewp, trd, etc.) than in the XSD (get-response.xsd) in XML, which you can and nothing prevents, "conditions-hash" will not be the same for partners.

The data is exactly the same, but the "conditions-hash" does not fit!

The XML-C14N function (in exclusive form) performs XML normalization, but does not remove prefixes from individual elements. As a result, the hash function does not return the same fingerprint.

This seems to be a problem in architecture design.

EvelienRenders commented 4 years ago

Perhaps @janinamincer-daszkiewicz would also know more about this one? I'm not sure if you're notified of these issues by Github, so I hope you don't mind me tagging you in them. If someone else from the team is better suited to answer, do let us know 🙂

janinamincer-daszkiewicz commented 4 years ago

Yes, I am notified about all issues in GitHub. This issue has to wait. The highest priority is now new format of LA and new specification.

ctu-developers commented 4 years ago

Thanks for your response. I look forward to new information and changes in this ticket. [ivoš@ctu-developers]

mkurzydlowski commented 3 years ago

@ctu-developers, sorry for the late response!

The only requirement for the procedure of calculating the hash is that the partners are able to verify the hash by calculating it on the original data. So the choice of namespace aliases shouldn't be an issue.

j-be commented 3 years ago

@mkurzydlowski are you saying we need to take the XML we receive as plain string, extract the parts that are needed to calculate the hash, remove the parts within that not needed for hash calculation, "normalize" what is left and pipe that through SHA256?

That seems like a really weird way to do stuff, taken that web frameworks usually don't pass through raw XML.

umesh-qs commented 3 years ago

We are going into the same discussion again. Please see https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/48#issuecomment-812861609

ctu-developers commented 3 years ago

I am sorry, I was ill.

@mkurzydlowski: Yes, it is. XML-C14N include the name-space! Try it. If you use the same RAW data and use the different name-space, then conditional hash doesn't match!

mkurzydlowski commented 3 years ago

@ctu-developers, I might be misunderstood by you. You wrote:

If you name the namespace differently (not c, ewp, trd, etc.) than in the XSD (get-response.xsd) in XML, which you can and nothing prevents [...]

I responded with:

The only requirement for the procedure of calculating the hash is that the partners are able to verify the hash by calculating it on the original data.

That's why I then added:

So the choice of namespace aliases shouldn't be an issue.

Might have worded it better. You don't choose the namespace aliases when calculating the hash to check what you have received from the partner. You use his namespace aliases.

demilatof commented 2 years ago

I apologize, but I'm still on hash computation. This should be quite easy, but it's enough a char to broke the hash.

I use Java and what I do now is: 1) read the IIA get-response and pass it to XPath 2) extract the cooperation-conditions node 3) remove the sending-contact and receiving -contacts (if present) 4) canonicalize the node (Exclusive canonical form) 5) compute hash256 of the canonicalized node

But they never match.

The namespace should not matter, because what I've to do is get the exact piece of xml (cooperation-conditions) sent by partner and compute the hash on it (whatever it contains, attributes or name aliases). The question is: does all the partners create the XML to answer a get-response, then make the above computation (points 1-5) and finally inject the "conditions-hash" in the XML code? Because if a partner assemble mobilities elements to compute the hash code and only after that the partner builds the final XML for get-response, my computation will always have a different base.

For example, I don't know Go Lang, but I'm not sure that the following procedure computes the hash in a way that could be repeated by the receiving partner. The "func calculateHash" create a piece of XML, but it seems to me that after the hash computation that XML is lost and will be created again somewhere else. There is no warranty it will be the same in every single char. The stated procedure is published as "resource" at https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/resources/GoLangHashCalculation.go

Could someone publish the IIA get-response containing hash computed with the "golden rule" and the string on which the hash256 has been computed, so that I can make some test? It would be even more useful if the same XML could by downloaded from the EWP Network calling the IIA get-response API.

Many thanks in advance

mkurzydlowski commented 2 years ago

I use Java and what I do now is:

read the IIA get-response and pass it to XPath

extract the cooperation-conditions node

remove the sending-contact and receiving -contacts (if present)

canonicalize the node (Exclusive canonical form)

It is crucial in this step to canonicalize an element that has been extracted from the root node (get response) in a way that doesn't loose the context (namespace definitions an aliases defined on the parent elements).

XML canonicalization has to preserve the namespaces and aliases.

The namespace should not matter, because what I've to do is get the exact piece of xml (cooperation-conditions) sent by partner and compute the hash on it (whatever it contains, attributes or name aliases).

That seems to be not in line with XML canonicalization as I stated above. Please look at the first sentence of the specification:

https://www.w3.org/TR/xml-exc-c14n/

sascoms commented 2 years ago

We have prepared https://erasmusjet.com/ewp-iia-hash-verify/ for iia hash calculation and verification for our own/public use.

@demilatof
You can paste your full IIA content XML in the box and submit it to see the resulting cooperation conditions and the hash computed.

Also if your system is ready, we can exchange an IIA for test purposes. This way we can check and examine the content and the hash to help you testing the hash and the IIA content. (our development env HEI ID: demo.erasmusport.com)

demilatof commented 2 years ago

@sascoms

We have prepared https://erasmusjet.com/ewp-iia-hash-verify/ for iia hash calculation and verification for our own/public use.

@demilatof You can paste your full IIA content XML in the box and submit it to see the resulting cooperation conditions and the hash computed.

Thanks a lot! I tried to paste a full IIA content XML downloaded from the EWP network (hei_id=unicineca.it iia_id=6826e0c1-5ab1-4b59-8150-2b2af07574ac) and it is different from the hash computed online.

If I try with hei_id=unibo.it and iid_id=UNIBO-IIA-UNICO-ACCORDI-128656 the hash is identical.

Therefore it seems to me that I have to add the namespace in the cooperation-conditions tag, even if it is not present. I'm not sure that it is the right way, because doing so we altered the real string received (e.g., the above IIA get-response from unibo.it doesn't have any alias or namespace declared or used in cooperation-conditions).

I understand what @mkurzydlowski says, but in my opinion the second sentence states something different because we are interested in signing only the payload () minus sendind/receiving-contacts:

However, some applications require a method which, to the extent practical, excludes ancestor context from a canonicalized subdocument. For example, one might require a digital signature over an XML payload (subdocument) in an XML message that will not break when that subdocument is removed from its original message and/or inserted into a different context. This requirement is satisfied by Exclusive XML Canonicalization.

https://www.w3.org/TR/xml-exc-c14n/

Also if your system is ready, we can exchange an IIA for test purposes. This way we can check and examine the content and the hash to help you testing the hash and the IIA content. (our development env HEI ID: demo.erasmusport.com)

I thank you again, I'll do as soon as my system is working; by now I'm not ready yet, because I'm trying to resolve hash computation first; I think you put me in the right way now, even if I have still the doubt on inserting namespaces in cooperation-condition tag. Doing so we work on a different string; moreover, the iia_id XML from unibo.it declare a namespace and three aliases. We have to analyze the content of the cooperation-conditions element to know if we need to add only the standard namespace or even the aliases declared in the first xml line? It wouldn't be easier (and less prone to misunderstanding) taking the exact cooperation-conditions element and working on it?

@janinamincer-daszkiewicz @mkurzydlowski is there an official interpretation on this question? The specification at https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd says nothing on this point.

Many thanks to all for your contributes and happy new year.

janinamincer-daszkiewicz commented 2 years ago

We will prepare something "official" soon. This discussion is very helpful as it shows where extra explanations are needed.

demilatof commented 2 years ago

@janinamincer-daszkiewicz

We will prepare something "official" soon. This discussion is very helpful as it shows where extra explanations are needed.

I hope really soon; I'm sure you understand that or all the partners compute the same code, or all the IIAs will be refused voiding the whole system. Now I'm able to compute the same hash of few HEIs (really few), e.g.: unibo.it, digitalis.pt, ual.es, uca.es. And it is the same hash computed by the @sascoms online tool (thanks again ;-)), even if I keep my doubts about adding namespace to cooperation-conditions tag. If we say that this is the right method, I'm afraid that half of the systems in dev-registry will have to change their algorithm. Moreover, I see that a lot of HEIs answer to iia-index listing their iia-id, but after that they don't answer to iia-get-response for one of those IDs. Ok, I'm just exploring others IIA that don't involve my Hei, but I thought that if the IIA IDs are listed, then they must be showed. But this is another story and we are in a dev environment...

umesh-qs commented 2 years ago

@demilatof I would just put a word of caution. The tool provided at "https://erasmusjet.com/ewp-iia-hash-verify/" is not an official interpretation and I would be careful building my hash logic on it. Also I can see you have already mentioned that it does not work with xml from all the service providers

demilatof commented 2 years ago

@umesh-qs Yes, I know. But at least we are able to compute the same hash after we agree on some common rules (e.g. adding namespace to cooperation-conditions tag). Presently I don't know yet what assumptions have made others developers. But what is most important and (IMHO) a weakness point, is that there is no a clear and unambiguous position in the official EWP specifications. The tool you provide should have been shared by official EWP coordinators.

If you wish, you can test other Hei by yourself, for example:

hei_id=unicineca.it iia_id=6826e0c1-5ab1-4b59-8150-2b2af07574ac

This hei seems to have a wrong hash, but if you query its partner for the binded IIA:

hei_id=usal.es iia_id=IAA-ID-694256586

you find that now the hash match with one computed from your online tool. That is, unicineca.it has used the counterpart hash, but probably modifying a bit the response, therefore the hash computation fails.

Another Hei that has a "wrong" hash: hei_id=hei.demo.usos.edu.pl iia_id=64883C22A7EE375BE0530B501E0A96D6

Finally I cannot even query all Heis that are in dev registry as dashboard users; I don't know if it is the same for you.

mkurzydlowski commented 2 years ago

@demilatof, I'm sorry for pointing you only to the first sentence of the specification. What the first paragraph is about is the reason behind the exclusive canonicalization. It describes why it is better to ignore namespaces that are not relevant to the subdocument.

I should have link you the detailed specification of the canonicalization where you can check how namespaces should be handled:

https://www.w3.org/TR/xml-exc-c14n/#sec-Specification

But I'm certain it is easier to have a look at the provided examples:

https://www.w3.org/TR/xml-exc-c14n/#sec-Simple

demilatof commented 2 years ago

@mkurzydlowski

@demilatof, I'm sorry for pointing you only to the first sentence of the specification. What the first paragraph is about is the reason behind the exclusive canonicalization. It describes why it is better to ignore namespaces that are not relevant to the subdocument.

Many thanks, but my doubt is not upon adding the namespaces not relevant to the subdocument (my question was on the complexity to evaluate what is relevant or not).

My main concern is that I'm not sure that adding the default namespace to the cooperation-conditions (if not present) is compatible with your statement posted here https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/53#issuecomment-851894706

This doesn't seem to be compliant with the documentation but you are right that it might be explicitly stated that using Exclusive XML Canonicalization means that the object being hashed must correspond to the response being sent and it shouldn't be altered after the hash is calculated.

As soon as you add the default namespace to cooperation-conditions tag you alter the response received. Or, if you want to see from a different point of view, if the sender include the namespace in the hash calculation of the cooperation-conditions, that declaration must be present in the response being sent and not removed. I think this should be a basic rule for signing something.

umesh-qs commented 2 years ago

@demilatof If you are applying the rules of exclusive canonicalization manually then you are inviting problems for yourself. Please use a standard library or function provided with the language you are working on.

Also regarding your earlier comment where you mentioned "wrong hash" from some of the providers, I am not sure on how you are claiming that. I can help validate the hash. Please send me the xmls via email.

demilatof commented 2 years ago

@umesh-qs

@demilatof If you are applying the rules of exclusive canonicalization manually then you are inviting problems for yourself. Please use a standard library or function provided with the language you are working on.

You're right, but not completely; the configuration make the difference. I use XPath and if I ignore namespace I can extract the cooperation-conditions element from the response. I print it in console before making the canonicalization and it is the same string I see in the response received. But the hash computation fails. If I tell XPath to be namespace aware, it includes namespace but the cooperation-conditions element is different from what I received. But the hash, after canonicalization, is the same I can compute with your online tool.

Also regarding your earlier comment where you mentioned "wrong hash" from some of the providers, I am not sure on how you are claiming that. I can help validate the hash. Please send me the xmls via email.

Well, I claim it pasting the response in your online tool. As you know, you can call iia-index against some hei and obtain the list of IIA IDs. You take an ID and call a iia-get-response against that hei_id passing one of the IDs you received.

I'll try to send you one of them.

demilatof commented 2 years ago

@umesh-qs I tried, but... How can send you the xmls via mail? It doesn't seem to me that GitHub allow me to send you an email. Anyway you can download the xml with the hei id and iia id I provided above.

As concern your online tool, the "Hash calculation principles" section says: Apply the XML namespace to the cooperation-conditions XML as below: xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd"

I think that this declaration is right for generating a IIA-get-response, not to check one received. If someone use a different get-response version it could produce a different hash; or I'm wrong. Could be useful to point out that for hash verification it would be better use the xmls value declared in iia-get-response-target of the get-response received?

Thanks again, you put me in the right way

umesh-qs commented 2 years ago

@demilatof I am sorry, I have not understood what you are asking. Below are the steps that we do if that helps.

Get raw xml response from the API call
Extract cooperation-conditions part from the raw xml in step 1
Remove sending-contact and receiving-contact from the partial xml in step 2 (make sure there is no blank line after the removal)
Use standard function to do exclusive xml canonicalization on xml in step 3
Calculate hash on xml from step 4

demilatof commented 2 years ago

@umesh-qs I apologize, my fault. I meant, how can I send you an email, since I don't know your email address and github doesn't have a messaging tool... I think

As concern the second point, at page https://erasmusjet.com/ewp-iia-hash-verify/ under "Hash Calculation Principles" you say

Apply the XML namespace to the cooperation-conditions XML as below: xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd"

This is true if someone is computing the hash to add to get-response, because the namespace refers to last version. But if you want to check the hash that you receive from someone else, you cannot be sure that he/she use the v6 endpoint specs

This is just a consideration, anyway. Please tell me if I'm still not clear enough

mkurzydlowski commented 2 years ago

My main concern is that I'm not sure that adding the default namespace to the cooperation-conditions (if not present) is compatible with your statement posted here #53 (comment)

This doesn't seem to be compliant with the documentation but you are right that it might be explicitly stated that using Exclusive XML Canonicalization means that the object being hashed must correspond to the response being sent and it shouldn't be altered after the hash is calculated.

You are right that that statement might be understood to broadly. I meant not altering whitespaces, namespace aliases, etc. I didn't mean not adding namespace declarations, if there were not present on this element. But this should be handled by the library!

I was writing that sentence believing that the step of applying exclusive canonicalization is something people will do uniformly (with help of the libraries).

You are right that you have to use namespace aware parsing in order to keep the namespace information. Again, please look into the example given in the specs. It does alter the subelement by adding a namespace declaration that is needed.

demilatof commented 2 years ago

You're right, I remember that you was talking about white spaces, but since the hash is used such as a pseudo signing, your sentence could be good for the cooperation-conditions element as raw string. And yes, as you remind me, the library take in account the namespace if I config it to do.

As concern the example in the specs, the one I've seen is not really clear, it doesn't show, for example, the output expected.

If I consider the "Hash calculation" paragraph at https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/resources/IIA_Signing_Behaviour.pdf I read

There is an important consideration to the Exclusive XML Canonicalization. The cooperation-conditions element has to contain the same namespace aliases as the XML response to the IIA GET method. If a namespace alias is autogenerated when marshalling to XML, then **it might be a good idea to set the namespace alias to a predefined value.**

Here again it seems that the hash calculation is pointed out from the server point of view, the one which generates the XML. That is, in my opinion the specifications say that is the server that has to take in account the namespace if used ("the cooperation-conditions element has to contain the same namespace aliases as the XML response") or setting "the namespace alias to a predefined value". In my opinion, starting from above considerations, the server has to take in account the namespace and insert the declaration in the cooperation-conditions tag. As a matter of fact, when the XPath library considers namespaces it produce a Node that can I output to a string and that string cointains explicitly the namespace.

If the server output the namespace declaration in the XML the cooperation-conditions elaborated by the library, the client should not need to configure its library.

Anyway, now I'm able to compute the hash such as other Heis are doing, not all, but neither zero as before. Therefore, if this is the right mode, I'm satisfied because it seems to be the official interpretation of the algorithm in case of we and another partrner don't agree with the hash value

pmarinelli commented 2 years ago

In case it might help anyone, here it is the XPath expression we generate to build the node set to canonicalize:

(/[local-name()='iias-get-response']/[local-name()='iia'][position()=1]/descendant::node() | /[local-name()='iias-get-response']/[local-name()='iia'][position()=1]/descendant::/attribute::| /[local-name()='iias-get-response']/[local-name()='iia'][position()=1]/descendant::/namespace::)[ancestor-or-self::*[local-name()='cooperation-conditions' and not(ancestor-or-self::node()[local-name()='sending-contact' or local-name()='receiving-contact'])]]

It is inspired by the example provided in Section 2.1 of xml-exc-c14n (https://www.w3.org/TR/xml-exc-c14n/).

[the position predicate is there just to deal with multiple IIAs within the same response, but in the context of this discussion it can be ignored]

The XPath expression itself removes the sending-contact and receving-contact elements (and preserves any whitespace character possibly surronding them).

The resulting node set is then used as input for the http://www.w3.org/2001/10/xml-exc-c14n# algorithm.

As xml-exc-c14n explicitly makes reference to XPath node sets, extracting the node set to canonicalize through XPath without any further manipulation sounds to me as the safest choice.

If those who are in charge of maintaining the specs agree with the above observation and approach, I think an "official" XPath expression might be helpful.

demilatof commented 2 years ago

@pmarinelli

Thanks for sharing your code. If I use your expression I receive an error, the same happens testing it with this tool: https://www.freeformatter.com/xpath-tester.html

Since we manage just a IIA for every request, we use a simpler expression even if we need more steps to obtain the element to be canonicalized. I don't know if doing so we lost some cases, anyway this is what we are doing:

1) First step, identifying the cooperation-conditions element; because we consider only one IIA for XML, the cooperation-conditions element is unique too: /*[local-name()='iias-get-response']/*[local-name()='iia'][1]/*[local-name()='cooperation-conditions'][1]

The expression could even be shorter, even if I don't know how much affect the performance //*[local-name()='cooperation-conditions'][1]

2) on the selected element (Node "condition"), we loop over the child nodes to remove sending/receiving-contacts from mobilities, if they exist:

NodeList mobilities = condition.getChildNodes();  // list the child nodes of cooperation-conditions
for (int m=0; m<mobilities.getLength(); m++)  // Inspect every node to find a mobility element
{
   Node mob = mobilities.item(m);
   if (mob.getNodeName().endsWith("-mobility-spec"))  // if the node is a mobility element
   {
      NodeList contacts = mob.getChildNodes();  // list all child nodes of mobility element

      for (int c=0; c<contacts.getLength(); c++)  // Inspect every node to find a sending/receiving-contact element
      {
          Node contactToBeRemoved = contacts.item(c);
           if (contactToBeRemoved.getNodeName().equals("sending-contact") ||    // if the node is "sending-contact"
                contactToBeRemoved.getNodeName().equals("receiving-contact") ) //        OR "receiving-contact"
           {
                mob.removeChild(contactToBeRemoved);         // Then remove it from mobility node
           }
       } // End c for loop
   }  // End if test
 } // End m for loop

At this point the condition node is ready to be canonicalized. Your expression is by far more compact and beautiful to see; I'm more familiar with step by step code to follow every single task.

Anyway my problem was not the above code, but this piece of code that I've had to add:

XPath xPath = XPathFactory.newInstance().newXPath();
NamespaceResolver nsResolver = new NamespaceResolver(document);      // NamespaceResolver is a class that implements NamespaceContext
xPath.setNamespaceContext(nsResolver);

pmarinelli commented 2 years ago

@demilatof , sorry for the wrong XPath expression: it misses some wildcard chars (I don't know what went wrong during the copy&paste operation). Here it is the correct expression:

(/*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::node() | /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/attribute::*| /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/namespace::*)[ancestor-or-self::*[local-name()='cooperation-conditions'] and not(ancestor-or-self::node()[local-name()='sending-contact' or local-name()='receiving-contact'])]

We apply our XPath expression to a DOM document built using a namespace aware parser. Maybe it is the reason why we don't need to set a namespace resolver:

XPath xPath = XPathFactory.newInstance().newXPath(); NodeList cooperationConditionSubtree = (NodeList) xPath.compile(cooperationConditionsSubtreeXPathExpressionBuilder.build(iiaNodePosition)).evaluate(document, XPathConstants.NODESET);

(the cooperationConditionsSubtreeXPathExpressionBuilder component builds the above XPath expression)

My contribution to the discussion is that xml-exc-c14n itself makes use of XPath expressions of the form (//. | //@* | //namespace::*)[ancestor-or-self::n1:elem1] to extract the subtree to canonicalize. So it seems to me quite "natural" to use an XPath expression of that kind to extract the cooperation-conditions subtree. Moreover, beign XPath a standard, I think it would be reasonable (and maybe useful both in terms of specs unambiguity and ease of implementation) to have an "official" XPath expression provided by the EWP specs themselves.

demilatof commented 2 years ago

@pmarinelli about the namespace resolver I realized now that I used it in some previous test, but as matter of fact it is useless if I use *[local-name()=...] in XPath Expression. I removed it and the code is working such as before.

Thanks a lot for the expression, now it is accepted. But it seems to me that it doesn't remove the sending/receiving-contact; I tested it in my code and even with the XPath tester at https://www.freeformatter.com/xpath-tester.htm

Using your expression my full code is a lot shorter:

      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      dbf.setNamespaceAware(true);      
      DocumentBuilder db = dbf.newDocumentBuilder();

      InputStream is = new ByteArrayInputStream(xmlBodyString..getBytes());
      Document document = db.parse(is); 

      XPath xPath = XPathFactory.newInstance().newXPath();     
      String expression = "(/*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::node() | /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/attribute::*| /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/namespace::*)[ancestor-or-self::*[local-name()='cooperation-conditions' and not(ancestor-or-self::node()[local-name()='sending-contact' or local-name()='receiving-contact'])]]";
      Node condition = (Node) xPath.compile(expression).evaluate(document, XPathConstants.NODE);

      if (condition!=null) 
      {              
        org.apache.xml.security.Init.init();
        Canonicalizer canonicalizer = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        canonicalizer.canonicalizeSubtree(condition, byteArrayOutputStream);

        String hash = DigestUtils.sha256Hex(byteArrayOutputStream.toByteArray());
      }

But, as I said, doing so it doesn't remove the contacts, if I'm not wrong. I tested with HEI_ID=hei.demo.usos.edu.pl IIA_ID=64883C22A7EE375BE0530B501E0A96D6 dumping the cooperation-conditions after this:

Node condition = (Node) xPath.compile(expression).evaluate(document, XPathConstants.NODE);

(anyway even with my code, the hash is different from the one present in the XML)

pmarinelli commented 2 years ago

@demilatof I edited the expression some moments ago because of a bug I fixed months ago but that somehow (re)entered the code base. It should now work.

demilatof commented 2 years ago

@pmarinelli

it seems to me that it still keeps contacts

pmarinelli commented 2 years ago

@demilatof I tested it using xmlstarlet, which is a command-line tool based on libxml2: it allows to canonicalize xml node sets extracted by xpath expressions.

sascoms commented 2 years ago

For anybody who would like to use a simple API for hash calculation, we can happily provide a token and a test environment API endpoint URL for test purposes. (email => sascoms@gmail.com)

Here is a sample and simple code (as a screenshot)

The tool available on https://erasmusjet.com/ewp-iia-hash-verify/ and we as ErasmusJET and Erasmus Port use this service.

Our test results till now: The rules/principles noted on the tool/website seems to be consistent with dashboard's hash computation. We will continue to test with other providers as well to see if there is a difference in hash computation.

Hope to see, soon, an official and clear documentation for hash computation on GitHub.

pmarinelli commented 2 years ago

@demilatof and I continued to discuss about the conditions hash calculation via email. For any Java developer who is interested in canonicalize as per xml-exc-c14n specs the result of the following XPath expression:

(/*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::node() | /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/attribute::*| /*[local-name()='iias-get-response']/*[local-name()='iia'][position()=1]/descendant::*/namespace::*)[ancestor-or-self::*[local-name()='cooperation-conditions'] and not(ancestor-or-self::node()[local-name()='sending-contact' or local-name()='receiving-contact'])]

it is required to evaluate the expression as a node-set (use XPathConstants.NODESET) and then to pass the obtained result to the method canonicalizeXPathNodeSet of the Canonicalizer class.

One point to pay attention to, is that when a result of type XPathConstants.NODESET is requested, the XPath evaluation returns an instance of NodeList, while the canonicalizeXPathNodeSet method expects a Set<Node> as its first argument: the conversion is achieved adding each item of NodeList into a Set<Node>.

As you may notice, I support the idea of building the XML subset to canonicalize using XPath. This is because xml-exc-c14n is designed to serialize XPath node-sets ("The goal of this specification is to establish a method for serializing the XPath node-set representation of an XML document or subset such that: [...]" Section 1).

demilatof commented 2 years ago

I think I have no more to add to what @pmarinelli wrote, his explanation is perfect. I can only point out that, as he showed me, there is not much space to variations. The XPath expression must be exactly the one he wrote, otherwise you could miss some important node. The same occurs for Java methods called.

This means, on my opinion, that even if the XPath expression has a certain level of complexity, it could be a candidate to be "THE" EWP expression to canonicalize the cooperation-conditions.

Nevertheless I successfully canonicalized the cooperation-conditions without having to use explicitly XPath, just using DOM and Apache Santuario canonicalizer. I think that, internally, Apache Santuario uses XPath to canonicalize an element such as described by @pmarinelli and W3C recommendation for Exclusive XML Canonicalization. In this case I have to use canonicalizeSubtree on the single node because if I use canonicalizeXPathNodeSet the canonicalized element is wrong. This suggests me that pure XPath requires canonicalizeXPathNodeSet, whilst pure DOM requires canonicalizeSubtree. If anyone is interested in, here it is my code:

      // Build the DOM Document
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      dbf.setNamespaceAware(true);

      DocumentBuilder db = dbf.newDocumentBuilder();
      Document document = db.parse(new ByteArrayInputStream(xmlBodyString.getBytes())); 

      // Look for cooperation-conditions
      Node condition = null;
      NodeList nodeList = document.getElementsByTagName("cooperation-conditions");
      if (nodeList.getLength()>0) condition = nodeList.item(0);

      if (condition!=null) // if found, process the node
      {
        NodeList conditionsChild = condition.getChildNodes(); 
        for (int m=0; m<conditionsChild.getLength(); m++)  // Loop over the condition child nodes
        {
          Node mobility = conditionsChild.item(m);
          if (mobility.getNodeName().endsWith("-mobility-spec"))  // if the node is a mobility node
          {
            NodeList mobilityChild = mobility.getChildNodes(); 

            for (int c=0; c<mobilityChild.getLength(); c++)  // Loop over the mobility child nodes
            {
              Node contact = mobilityChild.item(c);
              if (contact.getNodeName().equals("sending-contact") || 
                  contact.getNodeName().equals("receiving-contact") )  // if the node is the one to be removed
              {
                mobility.removeChild(contact); // remove it
              }
            }
          }
        }  

        // Canonicalization
        org.apache.xml.security.Init.init();
        Canonicalizer canonicalizer = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_EXCL_OMIT_COMMENTS);

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        canonicalizer.canonicalizeSubtree(condition, byteArrayOutputStream);

        String hash = DigestUtils.sha256Hex(byteArrayOutputStream.toByteArray());
      }

madiken21 commented 2 years ago

https://erasmusjet.com/ewp-iia-hash-verify/ and https://registry.erasmuswithoutpaper.eu/iiaHashValidator do not compute the same hash.

Using the IIA get response sample (https://raw.githubusercontent.com/erasmus-without-paper/ewp-specs-api-iias/stable-v6/endpoints/get-response-example.xml) :

ErasmusJET tool seems to respect the rules (root namespace=get-response, no whitespace, no line breaks) => compute hash = 431f752809e152e4c835d0d82262c4aaffbf7cbc9e337187a930367e02986865
registry tool validates the IIA get response sample with computed hash = 7c045bc4ca23b3b9953adb27374aa27dcd41cfdda74fff9d2240a813a80443ae

So which one is right ?

NB : using C# XmlDsigC14NTransform class, I obtain the same computed hash as ErasmusJET

umesh-qs commented 2 years ago

@madiken21 Difference is with and without whitespace. https://erasmusjet.com/ewp-iia-hash-verify/ is removing whitespace while https://registry.erasmuswithoutpaper.eu/iiaHashValidator keeps the whitespace

janinamincer-daszkiewicz commented 2 years ago

https://registry.erasmuswithoutpaper.eu/iiaHashValidator

This one is official, we take full responsibility, we will add Readme with explanations needed, there is also source code of this solution in GitHub. Michał will explain more tomorrow, during the day.

It has already been tested positively with some providers. We are open to discuss the algorithm with you and any other provider. We invited ErasmusJest for testing IIA together, waiting for the answer. We will be upgrading this validator, if needed.

demilatof commented 2 years ago

@madiken21

I tried to compute that hash both with pure XPath method (as suggested by @pmarinelli) and with my pure DOM code. They both produce the same hash code as https://registry.erasmuswithoutpaper.eu/iiaHashValidator

I think that @umesh-qs is right, the https://erasmusjet.com/ewp-iia-hash-verify/ is removing whitespace (I've just tried). But this should not be the correct methods, as @mkurzydlowski said here https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/53#issuecomment-851894706 and here https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/47#issuecomment-1004646757 "I meant not altering whitespaces, namespace aliases, etc."

Emkas commented 2 years ago

I think that @umesh-qs is right, the https://erasmusjet.com/ewp-iia-hash-verify/ is removing whitespace (I've just tried). But this should not be the correct methods, as @mkurzydlowski said here #53 (comment) and here #47 (comment) "I meant not altering whitespaces, namespace aliases, etc."

That must be the case. In "Conditions data used to compute hash" field whitespaces are removed. Here is a comment with direct link to Canonical XML Version 2.0 spec.

demilatof commented 2 years ago

@Emkas I'm not sure to full understand your position. I don't think the link yow wrote states that whitespaces must be removed

madiken21 commented 2 years ago

@demilatof , @umesh-qs

You're right. iI's all about whitespace. I successfully computed hash with both methods using .NET. But personnaly, i would prefer without whitespace because it prevents ambiguity about indentation chars and line break char ('\n' with Java, '\r\n' with .NET by default)

pmarinelli commented 2 years ago

@madiken21

NB : using C# XmlDsigC14NTransform class, I obtain the same computed hash as ErasmusJET

I am not a C# developer, but wouldn't XmlDsigExcC14NTransform be more compliant to the EWP specs rather than XmlDsigC14NTransform?

Giving a look at the .NET docs, XmlDsigC14NTransform implements the algorithm defined in https://www.w3.org/TR/xml-c14n/, while XmlDsigExcC14NTransform implements the algorithm defined in https://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.

The EWP specs talk about Exclusive XML Canonicalization, and I have not many doubts that the algorithm to use is the one defined in https://www.w3.org/TR/xml-exc-c14n/.

However, I think that the EWP specs might be clearer on this, because the current get-response.xsd has a link to https://www.w3.org/TR/xml-c14n2/ (see the documentation of the conditions-hash element definition), while IIA_Signing_Behaviour.pdf has link to https://www.w3.org/TR/xml-c14n/.

madiken21 commented 2 years ago

@pmarinelli

You're right. I finally used XmlDsigExcC14NTransform instead of XmlDsigC14NTransform. I didn't update my post

janinamincer-daszkiewicz commented 2 years ago

Thank you all for suggestions concerning specification and how to make it more clear. We will take it into consideration.

j-be commented 2 years ago

I'm still highly confused about this. I thought I might just use HashService.java. But then I noticed, that that thingy only handles documents with stable-v6 correctly.

So, if I change stable-v6 in my XML to stable-v5 the validator drops the namespace as well, which should be exactly the issue predicted by @demilatof here.

So that thingy only works as long as the whole EWP network only allows stable-v6.

I also don't understand what the IIA_Signing_Behaviour.pdf means when it says "has to contain the same namespace aliases as the response..." - we have 22 aliases in our GET response (not sure what JAX-RS does there, but it is valid XML), validator drops all but xmlns=", and even that - as mentioned - is only preserved if it points to the stable-v6 one.

TL;DR: Would someone here be so kind as to share a Java reference implementation which, takes a String containing the whole XML response and calculates a hash from it, which meets the spec and is compatible with an arbitrary root xmlns? I am beyond confused...

mkurzydlowski commented 2 years ago

I noticed, that that thingy only handles documents with stable-v6 correctly.

You are right, the Hash Validator works only with the latest IIA version.

To calculate a hash for a deprecated IIA version one would need to modify the namespace used in the code appropriately.

j-be commented 2 years ago

@mkurzydlowski yes, but that would mean I have to extract the namespace from a partner's response (using some kind of regex I guess?) and then use that?

Also, can you clarify what the PDF means by:

The cooperation-conditions element has to contain the same namespace aliases as the XML response to the IIA GET method

Our response XML starts with:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><iias-get-response xmlns="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd" xmlns:ns2="https://github.com/erasmus-without-paper/ewp-specs-types-contact/tree/stable-v1" xmlns:ns3="https://github.com/erasmus-without-paper/ewp-specs-types-phonenumber/tree/stable-v1" xmlns:ns4="https://github.com/erasmus-without-paper/ewp-specs-types-address/tree/stable-v1" xmlns:ns5="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/index-response.xsd" xmlns:ns6="https://github.com/erasmus-without-paper/ewp-specs-architecture/blob/stable-v1/common-types.xsd" xmlns:ns7="https://github.com/erasmus-without-paper/ewp-specs-sec-intro/tree/stable-v2" xmlns:ns8="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/manifest-entry.xsd" xmlns:ns9="https://github.com/erasmus-without-paper/ewp-specs-sec-cliauth-httpsig/tree/stable-v1" xmlns:ns10="https://github.com/erasmus-without-paper/ewp-specs-sec-srvauth-httpsig/tree/stable-v1" xmlns:ns11="https://github.com/erasmus-without-paper/ewp-specs-sec-cliauth-tlscert/tree/stable-v1" xmlns:ns12="https://github.com/erasmus-without-paper/ewp-specs-sec-srvauth-tlscert/tree/stable-v1" xmlns:ns13="https://github.com/erasmus-without-paper/ewp-specs-api-ounits/blob/stable-v2/manifest-entry.xsd" xmlns:ns14="https://github.com/erasmus-without-paper/ewp-specs-api-omobility-las/blob/stable-v1/manifest-entry.xsd" xmlns:ns15="https://github.com/erasmus-without-paper/ewp-specs-api-omobility-la-cnr/blob/stable-v1/manifest-entry.xsd" xmlns:ns16="https://github.com/erasmus-without-paper/ewp-specs-api-iia-cnr/blob/stable-v2/manifest-entry.xsd" xmlns:ns17="https://github.com/erasmus-without-paper/ewp-specs-api-institutions/blob/stable-v2/manifest-entry.xsd" xmlns:ns18="https://github.com/erasmus-without-paper/ewp-specs-api-iias-approval/blob/stable-v1/manifest-entry.xsd" xmlns:ns19="https://github.com/erasmus-without-paper/ewp-specs-api-iia-approval-cnr/blob/stable-v1/manifest-entry.xsd" xmlns:ns20="https://github.com/erasmus-without-paper/ewp-specs-api-factsheet/blob/stable-v1/manifest-entry.xsd" xmlns:ns21="https://github.com/erasmus-without-paper/ewp-specs-api-echo/blob/stable-v2/manifest-entry.xsd" xmlns:ns22="https://github.com/erasmus-without-paper/ewp-specs-api-discovery/blob/stable-v5/manifest-entry.xsd">

Why does the validator only expect /stable-v6/endpoints/get-response.xsd and drop the other 21?

mkurzydlowski commented 2 years ago

But that would mean I have to extract the namespace from a partner's response (using some kind of regex I guess?) and then use that?

You already ask for a specific API version when calling the partner, don't you?

Can you clarify what the PDF means?

I'm not the author of this document but excluding namespaces that are not used in the subdocument is what the "exclusive" stands for in exclusive canonicalization we are using in EWP.

j-be commented 2 years ago

You already ask for a specific API version when calling the partner, don't you?

Well, yeah, but no... My code has no way to know the XML namespace before getting the response, has it? All it sees is version="6.0.1" in the manifest, which then via a GitHub release by that name specifies the XML namespace to use, or am I missing something?

excluding namespaces that are not used in the subdocument is what the "exclusive" stands for in exclusive canonicalization we are using in EWP.

That's what I thought, hence my confusion about the wording "contain the same namespace aliases".

erasmus-without-paper / ewp-specs-api-iias

Unable to compute identical "conditions-hash" #47