erasmus-without-paper / ewp-specs-api-iias

Specifications of EWP's Interinstitutional Agreements API.
MIT License
4 stars 13 forks source link

While transforming XML, empty tags should be consider or not ? #170

Open anant6767 opened 7 months ago

anant6767 commented 7 months ago

When transforming XML, if an empty tag is present, it is being included in the generation of the text-to-hash value. If one partner (Partner A) includes an empty tag in their transformation process and another partner (Partner B) does not, this discrepancy can lead to different text-to-hash values being generated, potentially causing synchronization issues within the network. It is essential to standardize the transformation approach to ensure consistency in the text-to-hash values generated by.

For example Partner A cooperation-conditions xml

<ia6:cooperation-conditions xmlns:ia6="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd" xmlns:ewp="https://github.com/erasmus-without-paper/ewp-specs-architecture/blob/stable-v1/common-types.xsd" xmlns:c="https://github.com/erasmus-without-paper/ewp-specs-types-contact/tree/stable-v1" xmlns:trm="https://github.com/erasmus-without-paper/ewp-specs-types-academic-term/tree/stable-v1" xmlns:p="https://github.com/erasmus-without-paper/ewp-specs-types-phonenumber/tree/stable-v1" xmlns:a="https://github.com/erasmus-without-paper/ewp-specs-types-address/tree/stable-v1">
    <ia6:student-studies-mobility-spec>
        <ia6:sending-hei-id>ewp100-bo.staging.moveon4.com</ia6:sending-hei-id>
        <ia6:receiving-academic-year-id>2027/2028</ia6:receiving-academic-year-id>
        <ia6:receiving-academic-year-id>2028/2029</ia6:receiving-academic-year-id>
        <ia6:mobilities-per-year>12</ia6:mobilities-per-year>
        <ia6:recommended-language-skill>
            <ia6:language>ay</ia6:language>
            <ia6:cefr-level>A1</ia6:cefr-level>
        </ia6:recommended-language-skill>
        <ia6:recommended-language-skill>
            <ia6:language>eu</ia6:language>
            <ia6:cefr-level>A2</ia6:cefr-level>
        </ia6:recommended-language-skill>
        <ia6:subject-area>
            <ia6:isced-f-code>0110</ia6:isced-f-code>
            <ia6:isced-clarification>isced 001</ia6:isced-clarification>
        </ia6:subject-area>
        <ia6:subject-area>
            <ia6:isced-f-code>0112</ia6:isced-f-code>
            <ia6:isced-clarification/>
        </ia6:subject-area>
        <ia6:other-info-terms>new additonal values to be checkecd</ia6:other-info-terms>
        <ia6:total-months-per-year>33</ia6:total-months-per-year>
        <ia6:blended>true</ia6:blended>
        <ia6:eqf-level>4</ia6:eqf-level>
        <ia6:eqf-level>5</ia6:eqf-level>
        <ia6:eqf-level>6</ia6:eqf-level>
        <ia6:eqf-level>8</ia6:eqf-level>
    </ia6:student-studies-mobility-spec>
</ia6:cooperation-conditions>

Partner B cooperation-conditions xml

<ia6:cooperation-conditions xmlns:ia6="https://github.com/erasmus-without-paper/ewp-specs-api-iias/blob/stable-v6/endpoints/get-response.xsd" xmlns:ewp="https://github.com/erasmus-without-paper/ewp-specs-architecture/blob/stable-v1/common-types.xsd" xmlns:c="https://github.com/erasmus-without-paper/ewp-specs-types-contact/tree/stable-v1" xmlns:trm="https://github.com/erasmus-without-paper/ewp-specs-types-academic-term/tree/stable-v1" xmlns:p="https://github.com/erasmus-without-paper/ewp-specs-types-phonenumber/tree/stable-v1" xmlns:a="https://github.com/erasmus-without-paper/ewp-specs-types-address/tree/stable-v1">
    <ia6:student-studies-mobility-spec>
        <ia6:sending-hei-id>ewp100-bo.staging.moveon4.com</ia6:sending-hei-id>
        <ia6:receiving-academic-year-id>2027/2028</ia6:receiving-academic-year-id>
        <ia6:receiving-academic-year-id>2028/2029</ia6:receiving-academic-year-id>
        <ia6:mobilities-per-year>12</ia6:mobilities-per-year>
        <ia6:recommended-language-skill>
            <ia6:language>ay</ia6:language>
            <ia6:cefr-level>A1</ia6:cefr-level>
        </ia6:recommended-language-skill>
        <ia6:recommended-language-skill>
            <ia6:language>eu</ia6:language>
            <ia6:cefr-level>A2</ia6:cefr-level>
        </ia6:recommended-language-skill>
        <ia6:subject-area>
            <ia6:isced-f-code>0110</ia6:isced-f-code>
            <ia6:isced-clarification>isced 001</ia6:isced-clarification>
        </ia6:subject-area>
        <ia6:subject-area>
            <ia6:isced-f-code>0112</ia6:isced-f-code>
        </ia6:subject-area>
        <ia6:other-info-terms>new additonal values to be checkecd</ia6:other-info-terms>
        <ia6:total-months-per-year>33</ia6:total-months-per-year>
        <ia6:blended>true</ia6:blended>
        <ia6:eqf-level>4</ia6:eqf-level>
        <ia6:eqf-level>5</ia6:eqf-level>
        <ia6:eqf-level>6</ia6:eqf-level>
        <ia6:eqf-level>8</ia6:eqf-level>
    </ia6:student-studies-mobility-spec>
</ia6:cooperation-conditions>

I suggest that in both the transform_version_6.xsl and transform_version_7.xsl stylesheets, a conditional check should be implemented to prevent empty tags from being factored into the text-to-hash generation process. This will help ensure consistent text-to-hash values across different partners, regardless of the inclusion of empty tags.

janinamincer-daszkiewicz commented 4 months ago

@mkurzydlowski, @kkaraogl, please do some testing.

mkurzydlowski commented 4 months ago

There are a few issues that haven't been reported yet:

Also, we still don't know how to handle:

janinamincer-daszkiewicz commented 4 months ago

Hash changes after we introduce new XSLT. Maybe we won't introduce a new one but use this only for comparison?

Yes. XSLT we have for calculating iia-hash will not change, the one being elaborated here can help in comparing copies of IIA.

janinamincer-daszkiewicz commented 4 months ago

Thanks to AUTh team for joining tests.

demilatof commented 4 months ago

Hash changes after we introduce new XSLT. Maybe we won't introduce a new one but use this only for comparison?

Yes. XSLT we have for calculating iia-hash will not change, the one being elaborated here can help in comparing copies of IIA.

Since the specifications (and the MBR) don't require the identity between the hash codes, neither the new XSLT can ensure to reach an identity, I think that it could be used only on a volunteer basis and only if an identity can be useful in a provider's implementation.

Therefore, I think that this XSLT should be only applied as a second step, internally and outside the network perspective.

mkurzydlowski commented 4 months ago

We will need to be very wary not to introduce confusion when another XSLT appears in the repo, as it is crucial that everyone uses the same XSLT to calculate the hash!

umesh-qs commented 4 months ago

There are a few issues that haven't been reported yet:

  • Sending/Receiving contacts, inside the cooperation, conditions cause differences in hash calculation.
  • Capitalization of both CEFR-Level and Language causes differences in hash calculation.
  • The use of the xml:lang attribute causes differences in hash calculation.

Also, we still don't know how to handle:

  • Differences in free text fields (elements: other-info-terms, isced-clarification).
  • Language element codes. This XSLT version assumes that we only need to ignore subtags.
  • Hash changes after we introduce new XSLT. Maybe we won't introduce a new one but use this only for comparison?

@mkurzydlowski please provide examples. I am sure most of these can be fixed.

umesh-qs commented 4 months ago

There are a few issues that haven't been reported yet:

  • Sending/Receiving contacts, inside the cooperation, conditions cause differences in hash calculation.
  • Capitalization of language causes differences in hash calculation.

Also, we still don't know how to handle:

  • Differences in free text fields (elements: other-info-terms, isced-clarification).
  • Hash changes after we introduce new XSLT. Maybe we won't introduce a new one but use this only for comparison?

At some point hash will change when there is a new version of IIA with more fields in hash calculation. Like it changed when moving from v6 to v7. How do you plan to handle that?

mkurzydlowski commented 3 months ago

@umesh-qs, I have noticed that we are discussing this in the wrong place. I will send you examples by email, but if we want to discuss it further in GitHub, then we should do it in a proper issue.

mkurzydlowski commented 3 months ago

@umesh-qs has updated the XSLT, and now the only thing remaining is, of course, differences in free text fields. I presume users will handle such cases manually.

10-Jul-2024.txt

janinamincer-daszkiewicz commented 3 months ago

First, we will be grateful for more testing. Second, what do you think, where is the best place for such XSLT? Also, let's remember that this comment is important

Therefore, I think that this XSLT should be only applied as a second step, internally and outside the network perspective.

We could put this XSLT somewhere with a proper description.

demilatof commented 3 months ago

First, we will be grateful for more testing. Second, what do you think, where is the best place for such XSLT. Also, let's remember that this comment is important

Therefore, I think that this XSLT should be only applied as a second step, internally and outside the network perspective.

We could put this XSLT somewhere with a proper description.

The important question is: could the partner not approve my IIA because it has a different hash code? In my opinion, if someone wants an identity, it's up to his/her IROs to modify their IIA to reach the identity, since it's not required by the specifications. The network should clarify who has to do something if he/she needs something optional.

janinamincer-daszkiewicz commented 3 months ago

The important question is: could the partner not approve my IIA because it has a different hash code?

It should not be done automatically. It is up to the users to decide, not to the system.

fioravanti-unibo commented 3 months ago

First, we will be grateful for more testing. Second, what do you think, where is the best place for such XSLT. Also, let's remember that this comment is important

Therefore, I think that this XSLT should be only applied as a second step, internally and outside the network perspective.

We could put this XSLT somewhere with a proper description.

The important question is: could the partner not approve my IIA because it has a different hash code? In my opinion, if someone wants an identity, it's up to his/her IROs to modify their IIA to reach the identity, since it's not required by the specifications. The network should clarify who has to do something if he/she needs something optional.

in my (humble) opinion, the partner cannot refuse an approval for a different hash code between hers and partner's agreement. The binary level identity between agreements is not mandatory so I think that the agreements can remain slightly different but semantically equal and can be mutally approved if the partners agree.

demilatof commented 3 months ago

The important question is: could the partner not approve my IIA because it has a different hash code?

It should not be done automatically. It is up to the users to decide, not to the system.

But the specifications should state under what technical circumstances an IIA is not valid for an approval. The approval is a business process, if the code identity is not mandatory, a system cannot forbid its IROs to approve. Instead, it seems to me that some providers block their IROs.

skishk commented 3 months ago

It should not be done automatically. It is up to the users to decide, not to the system.

( it would be nice if this statement were also valid for the LAs 😁 )

demilatof commented 3 months ago

It should not be done automatically. It is up to the users to decide, not to the system.

( it would be nice if this statement were also valid for the LAs 😁 )

You're more skilled than me in production: all providers allow their IROs to approve an IIA if they detect a different hash code?

fioravanti-unibo commented 3 months ago

It should not be done automatically. It is up to the users to decide, not to the system.

( it would be nice if this statement were also valid for the LAs 😁 )

You're more skilled than me in production: all providers allow their IROs to approve an IIA if they detect a different hash code?

in my experience, the identity of the hash codes of the two agreements is only critical for some providers, generally what matters is the content of the agreements, as it should be

skishk commented 3 months ago

You're more skilled than me in production: all providers allow their IROs to approve an IIA if they detect a different hash code?

I confirm that some providers (they are already known) don't allow IROs to approve if the IIA are not identical and usually partners IROs can't understand why so ask to us what is wrong and then you can imagine the extra work for each one of us have to do (from IROs to IT teams).

kkaraogl commented 3 months ago

We can confirm that there are cases of IIAs in production that are vastly different content-wise. We already have a way of detecting these content discrepancies, and inform our IROs.

What does (slightly or vastly) different mean, must be detailed in the specs. In my opinion, the XSLT transformation proposal can also be flexible in accommodating the level of different we wish to cover.

demilatof commented 3 months ago

What does (slightly or vastly) different mean, must be detailed in the specs. In my opinion, the XSLT transformation proposal can also be flexible in accommodating the level of different we wish to cover.

The XSLT produces a text to be hashed. I'm really interested in learning how a hash can provide the level of differences we wish to cover. I used to know that two hashes, by definition, could be only equals or different, not almost equal. But maybe I missed something.

fioravanti-unibo commented 3 months ago

What does (slightly or vastly) different mean, must be detailed in the specs. In my opinion, the XSLT transformation proposal can also be flexible in accommodating the level of different we wish to cover.

The XSLT produces a text to be hashed. I'm really interested in learning how a hash can provide the level of differences we wish to cover. I used to know that two hashes, by definition, could be only equals or different, not almost equal. But maybe I missed something.

The only way that XSLT can be used and useful is to exclude all free-form texts from the calculation.

In this way the parts involved in the calculation and transformation can be truly the same or different. This comparison obviously will not provide the identity but could form the basis for a subsequent comparison, to be carried out manually on the free-form text parts.

kkaraogl commented 3 months ago

What does (slightly or vastly) different mean, must be detailed in the specs. In my opinion, the XSLT transformation proposal can also be flexible in accommodating the level of different we wish to cover.

The XSLT produces a text to be hashed. I'm really interested in learning how a hash can provide the level of differences we wish to cover. I used to know that two hashes, by definition, could be only equals or different, not almost equal. But maybe I missed something.

The only way that XSLT can be used and useful is to exclude all free-form texts from the calculation.

In this way the parts involved in the calculation and transformation can be truly the same or different. This comparison obviously will not provide the identity but could form the basis for a subsequent comparison, to be carried out manually on the free-form text parts.

Or other fields that are not commonly agreed as being significant, detailed of course by the specs.

skishk commented 3 months ago

and if one partner insert not required data and on the other side (the other partner) don't insert them?

janinamincer-daszkiewicz commented 3 months ago

and if one partner insert not required data and on the other side (the other partner) don't insert them?

End users have to decide. As they would in old times.

skishk commented 3 months ago

End users have to decide. As they would in old times.

that means the system must not block the IROs to approve the IIA? i hope so...

janinamincer-daszkiewicz commented 3 months ago

The blocking is acceptable only in case the resulting XML is against the specification. Is also required, for example a system should not allow to enter incorrect emails or URLs (as we discussed during the last IF meeting).

mkurzydlowski commented 2 weeks ago

@umesh-qs has updated the XSLT, and now the only thing remaining is, of course, differences in free text fields. I presume users will handle such cases manually.

10-Jul-2024.txt

I have prepared a change for review: https://github.com/erasmus-without-paper/ewp-specs-api-iias/compare/iia-comparison https://github.com/erasmus-without-paper/ewp-specs-api-iias/tree/iia-comparison/resources/xsltKit#iia-comparison