erasmus-without-paper / ewp-specs-api-iias

Specifications of EWP's Interinstitutional Agreements API.
MIT License
4 stars 13 forks source link

IIA version change impact on cooperation conditions hash #109

Closed mkurzydlowski closed 1 year ago

mkurzydlowski commented 1 year ago

As we already know IIA major version change impacts the cooperation conditions hash, as it changes the XML namespace that is part of the cooperation conditions XML element being hashed. So even if a IIA version change doesn't impact the fields in this element, the hash changes.

Let's discuss how this case should be handled and what impact this has.

Should the cooperation conditions hash change after version change or should it be stored and stable until IIA changes? If we choose to recalculate hashes then should we send CNRs? If the partner chooses to recalculate hashes, should we recalculate IIA approvals?

georgschermann commented 1 year ago

What about changes to the IIA structure (new API version)? That would change the hash even when using custom XSLT transformation.

if the changes are relevant the hash should probably change, if not, the added/changed fields could be added to the xslt to be ignored

It could even be used to solve (partially) this issue. If we can compute the hash code of every cooperation condition, from one time to another we can easily identify the added, removed or changed cooperation conditions (of course, we can do this as an internal solution too)

we do this internally but this solution is limited, when there are 2 ccs which are changed, there are no matching hashes, which one is which or are they completely new, these cases are currently manually reviewed and mapped.

demilatof commented 1 year ago

What about changes to the IIA structure (new API version)? That would change the hash even when using custom XSLT transformation.

If every new API version you release its own XLST you should be able to make it version proof, it depends on how much and what we change. For example, now we will have academic years starting/ending instead of a list. An XLST can uniform them in a way or in another.

Anyway, this time, we will have issues. Our approval will be saved according to the current specification, all of them will be not compatible with the new methods.

It could even be used to solve (partially) this issue. If we can compute the hash code of every cooperation condition, from one time to another we can easily identify the added, removed or changed cooperation conditions (of course, we can do this as an internal solution too)

we do this internally but this solution is limited, when there are 2 ccs which are changed, there are no matching hashes, which one is which or are they completely new, these cases are currently manually reviewed and mapped.

Yes, I know, it could only be used to highlight the changed CCs, not the mapping CC to CC or what is changed inside.

umesh-qs commented 1 year ago

using an XSLT to generate version-agnostic text on which calculate an hash seems a good idea to us: we were talking about this option last week

Can someone explain this with an example?

umesh-qs commented 1 year ago

if the changes are relevant the hash should probably change, if not, the added/changed fields could be added to the xslt to be ignored

Not always. For example the recent change that will come up related to start and end academic year, replacing the current list of academic year.

skishk commented 1 year ago

Language Skills for students and teachers mobility will become mandatory for all flows that means if we don't specify it for the outgoing flow the hash will change and all the IIAs need new approval.

LDeprez commented 1 year ago

Since the IF today showed that switching away from the hash and trying to keep the provability introduces a cascade of new issues for some providers we propose the following:

Stay with the hash, don't change the XSD Define a namespace un-aware hash calculation (e.g. using a XSLT template), stripping out namespaces, white spaces, irrelevant elements, probably non ascii characters to prevent encoding issues, etc. Probably with a shared reference implementation in Java or other languages.

Alternatively remove hash an use the http sig digest instead in the approval, but I think this would be inferior to the above.

Adding the full xml to approval is not a feasible solution I think.

Ghent University is in favour of this proposal.

janinamincer-daszkiewicz commented 1 year ago

Georg, please elaborate your proposal.

mkurzydlowski commented 1 year ago

if the changes are relevant the hash should probably change, if not, the added/changed fields could be added to the xslt to be ignored

But we wanted to have version id not change after IIA API version change - never. Otherwise we wouldn't solve the issue described in this thread. I'm not sure how you would like to solve this.

demilatof commented 1 year ago

But we wanted to have version id not change after IIA API version change - never. Otherwise we wouldn't solve the issue described in this thread. I'm not sure how you would like to solve this.

If we have an XLST that frees the XLM from the namespace, we have no more the problem that the version changes, and if we want to offer the transformed XML with the approval, we can offer the data and not the whole file that we need to manage as a byte sequence.

mkurzydlowski commented 1 year ago

If we have an XLST that frees the XLM from the namespace, we have no more the problem that the version changes

We only solve the namespace change but not the XML structure change (e.g. fields added/changed/removed).

mkurzydlowski commented 1 year ago

If IIA GET XML response is deemed problematic than I suggest sticking to what already has been implemented by all of us - CC hash calculated using XML canonicalization. That is a good enough proof.

To solve the issue of IIA API version change after mutual approval is given we could use the same "technique" as proposed for version ID. This would be the rules for hash calculation:

What do you think about such solution? Seems adequate and very simple.

Please note that IIA mapping would not take part in the hash calculation (as it is not involved currently). Change to IIA mapping would need to be detected as any other change to an IIA field.

demilatof commented 1 year ago
  • As soon as mutual approval is reached both parties store their hashes (the ones that were approved) and serve them in the IIA GET response instead of calculating them dynamically. This is the last moment the hash needs to be verified by the partner.

  • Now if IIA API version changes the hash for all approved IIAs stays the same.

  • Only if a partner decides to modify an approved IIA, then the hash is appropriately recalculated and the normal change-approve workflow starts again.

I think that it cannot work because when you get an IIA with the new API version you may have to add to it some elements that you don't need to add because your IIA is indeed unchanged. Therefore, you must change the hash code anyway, simply because you have generated the same IIA under the new version.

If we are looking for an intuitive way to trigger an approval need, I can suggest to use the partner's will and recall it in the IIA. A new approval is needed only when the partner sends an IIA CNR to tell that something relevant change. We can miss it, but if we add into the IIA a new element <last-CNR-sent-on-date> who receives the IIA can compare this date with the date he sent his last Approval CNR to know if he has to approve the IIA again or not.

janinamincer-daszkiewicz commented 1 year ago

Therefore, you must change the hash code anyway, simply because you have generated the same IIA under the new version.

The idea is to resent the stored hash when unimportant changes happen.

mkurzydlowski commented 1 year ago

I think that it cannot work because when you get an IIA with the new API version you may have to add to it some elements that you don't need to add because your IIA is indeed unchanged. Therefore, you must change the hash code anyway, simply because you have generated the same IIA under the new version.

As pointed in the rule above:

As soon as mutual approval is reached both parties store their hashes (the ones that were approved) and serve them in the IIA GET response instead of calculating them dynamically.

So it is not important if the IIA changed because of the IIA API version change - you still serve the hash that you stored.

demilatof commented 1 year ago

So it is not important if the IIA changed because of the IIA API version change - you still serve the hash that you stored.

The idea is to resent the stored hash when unimportant changes happen.

Who decides if a required change is unimportant? E.g.: mobilities-per-year will be mandatory in the next version, whilst it could not present now. If now you approve my IIA without mobilities-per-year and tomorrow you call my IIA Get API ver. 7, I MUST provide the mobilities-per-year, and since it is a positiveInteger, I cannot put zero. How can we handle this?

mkurzydlowski commented 1 year ago

If now you approve my IIA without mobilities-per-year and tomorrow you call my IIA Get API ver. 7, I MUST provide the mobilities-per-year, and since it is a positiveInteger, I cannot put zero. How can we handle this?

If it is approved then you use the stored hash it is not important if the IIA changed due to this IIA API version change.

demilatof commented 1 year ago

If it is approved then you use the stored hash it is not important if the IIA changed due to this IIA API version change.

Are you sure? I receive an IIA, I see that it is changed but I have to assume that it is the same because you haven't changed the hash code whilst you should have? We loose the function of the hash code

mkurzydlowski commented 1 year ago

If the IIA is approved and I haven't changed the hash code then yes, you should assume that I didn't propose a modification of an approved IIA.

umesh-qs commented 1 year ago

If the IIA is approved and I haven't changed the hash code then yes, you should assume that I didn't propose a modification of an approved IIA.

What if the partner has changed some important data that needs hash change?

mkurzydlowski commented 1 year ago

If the partner wants to propose a modification of an approved IIA then he needs to recalculate the hash.

Did I understood your question?

umesh-qs commented 1 year ago

If the partner wants to propose a modification of an approved IIA then he needs to recalculate the hash.

Did I understood your question?

My question was what if partner does not recalculate the hash?

demilatof commented 1 year ago

If the partner wants to propose a modification of an approved IIA then he needs to recalculate the hash.

Did I understood your question?

I think not.

Old IIA:

            <student-studies-mobility-spec>
                <sending-hei-id>uw.edu.pl</sending-hei-id>
                <sending-ounit-id>140</sending-ounit-id>
                 ....
                <!-- omitted mobilities-per-year --->
                 ...
            </student-studies-mobility-spec>

New IIA:

             <student-studies-mobility-spec>
                <sending-hei-id>uw.edu.pl</sending-hei-id>
                <sending-ounit-id>140</sending-ounit-id>
                 ....
               <mobilities-per-year>1</mobilities-per-year> <!-- I MUST put this element, even if I don't want to change the IIA -->
                 ...
            </student-studies-mobility-spec>

How can you distinguish if the mobilities-per-year is a real change or it is a change required to be compliant with version 7? You don't change the hash code, but you should. And if you make an error and you really want to change the IIA, but you don't change the hash code? How can I know?

janinamincer-daszkiewicz commented 1 year ago

<mobilities-per-year>1</mobilities-per-year> <!-- I MUST put this element, even if I don't want to change the IIA -->

This is an important change, hash should be recalculated.

mkurzydlowski commented 1 year ago

How can you distinguish if the mobilities-per-year is a real change or it is a change required to be compliant with version 7?

If the partner wanted to modify an approved IIA he must recalculate (change) the hash - that is the distinction.

You don't change the hash code, but you should.

If you don't modify the IIA but only change the IIA version then you shouldn't change the hash - as per proposal.

And if you make an error and you really want to change the IIA, but you don't change the hash code? How can I know?

As with any other error when you want to do something but change/don't change IIA appropriately. Such "errors" are not covered by any solution. Partner might notice that there is a change not related to IIA version change. If he doesn't notice that then eventually the client that made the error will notice a lack of response from the other side.

demilatof commented 1 year ago

<mobilities-per-year>1</mobilities-per-year> <!-- I MUST put this element, even if I don't want to change the IIA -->

This is an important change, hash should be recalculated.

No, I think you don't catch the point. This is a fake data I have to add to be able to answer to your IIA Get call with a well formed XML (that requires it). I would never have sent you an IIA CNR otherwise, my IIA is approved and good as it is, without that element.

janinamincer-daszkiewicz commented 1 year ago

No, I think you don't catch the point.

I did, I just think that in such case hash should change. Do we know how common the situation was with missing <mobilities-per-year>?

demilatof commented 1 year ago

I did, I just think that in such case hash should change. Do we know how common the situation was with missing <mobilities-per-year>?

It doesn't matter, it could be another element. We are implementing something that should be always valid. You're compelling me to specify something that wasn't required before and to require a new approval.

The parts are the owners of their copy, they have to decide and nobody else if an approved IIA has to be re-approved or not; and by sure not for technical reasons.

janinamincer-daszkiewicz commented 1 year ago

What we suggest is to keep and get use of the previous hash functonality (for comparison), but on the other hand release limitations of it (in the previous proposal it was replaced by version-id).

janinamincer-daszkiewicz commented 1 year ago

It doesn't matter, it could be another element.

Yes, and for some elements re-approval will be needed, for others don't. If 0 would be an acceptable value for <mobilities-per-year> element we would use this and woud not re-calculate the hash.

demilatof commented 1 year ago

If 0 would be an acceptable value for <mobilities-per-year> element we would use this and woud not re-calculate the hash.

Currently, 0 is not an acceptable value

janinamincer-daszkiewicz commented 1 year ago

Yeah, I know, this is why we would need re-calculation. There IS a difference between 0 and 1 mobilities.

demilatof commented 1 year ago

Yeah, I know, this is why we would need re-calculation. There IS a difference between 0 and 1 mobilities.

Exactly, but if I said no mobilities and you approved it, now I don't want to tell you that they are 1.

The main problem is: do we have a set of elements, mandatory or not that made up an IIA? If yes, we can have an XSLT that considers them all and fill with empty value if not previously considered or no more considered. If not, I don't see any easy solution: we can always have something that is not compatible with the previous version.

georgschermann commented 1 year ago

using an XSLT to generate version-agnostic text on which calculate an hash seems a good idea to us: we were talking about this option last week

Can someone explain this with an example?

https://www.ibm.com/docs/en/i/7.1?topic=functions-example-using-xslt-remove-namespaces https://stackoverflow.com/questions/5268182/how-to-remove-namespaces-from-xml-using-xslt

If we have an XLST that frees the XLM from the namespace, we have no more the problem that the version changes

We only solve the namespace change but not the XML structure change (e.g. fields added/changed/removed).

most of these changes can be easily handled by xslt like start/end year: https://stackoverflow.com/questions/52468538/xslt-current-group-select-first-and-last-element

even if xslt is not used for this, any reference algorithm can specify how to transform elements for hash calculation, specify defaults for missing elements or the like.

<mobilities-per-year>1</mobilities-per-year> <!-- I MUST put this element, even if I don't want to change the IIA -->

This is an important change, hash should be recalculated.

if the xslt is updated with the version, it can be used to distinguish between important and unimportant changes, it can add / change / remove elements and if the change is important (even if only done because of a version update) I WANT to have a new hash and new approval. If I have an exchange for 6 months and no mobilities-per-year and now my partner adds 1000 mobilities per year because he has to put something there, I don't want this to be automatically approved.

If there are no relevant changes, it is completely irrelevant if the hash changes, because I still have the old hash, the old iia get xml, the old approval and everything I ever need to validate/proof it. If I ever want to change anything or have the v7 iia approved then the hash is changed anyway.

demilatof commented 1 year ago

if the xslt is updated with the version

I think this is the most important point: we need an XSLT per version

mkurzydlowski commented 1 year ago

Changing current hash to XSLT means that:

What exactly will be better handled when compared to my last proposal?

if the change is important (even if only done because of a version update) I WANT to have a new hash and new approval. If I have an exchange for 6 months and no mobilities-per-year and now my partner adds 1000 mobilities per year because he has to put something there, I don't want this to be automatically approved.

Are you saying that an approved IIA after IIA API version change might sometime need a re-approval? Is that really what we expect from the business perspective? That seems weird.

skishk commented 1 year ago
  • every hash has to change and all approvals are made invalid?

if the language skills become mandatory as decided (or still a proposal i don't remember) if someone now doesn't specified it in their IIA he will need new approval for all his IIA anyway.

jiripetrzelka commented 1 year ago

I would prefer a solution in which the hash would keep its primary function, i.e. data integrity check, and in which I would not have to trust the counterparty whether it changed the hash when it was supposed to and not when it was not supposed to. I don't agree with the underlying assumption that we need to keep the hash unchanged when API version changes. I suggest a solution in which the approving party would have to issue a new approval as soon as it detects that the owning party has switched to a new version of the IIA API. Most of the approvals could be updated automatically without the intervention of the approving IRO if the software of the approving party inspects the IIA GET response of the owning party and finds no business level changes in it.

As to the XSLT proposition, I do agree that something in this direction would be beneficial and would reduce the probability of unnecessary changes in the conditions-hash in the future (when switching from version 7 to version 8), but a similar effect could possibly be attained if we decided to specify that the namespace should not be part of the hash calculation, and how many decimal digits should be present in number of months, and the number of digits in ISCEDs and possibly something else as well.

Also, if we aim to require new approval when partner IIA ID changes, this element should be moved inside the cooperation-conditions element.

demilatof commented 1 year ago

Changing current hash to XSLT means that:

  • everyone has to change their implementation,
  • every hash has to change and all approvals are made invalid?

Not necessarily; anyway what we should try to avoid is to implement something that works only under optimal assumptions.

I still think that XSLT could be the best solution, but we have to investigate what we have and what XSLT we need.

What we should already have: the snapshot of the IIA XML we have approved, currently it should be compliant with v6.

What we need:

  1. An intermediate XML format that defines CCs (plus partner's IIA-ID, if we wish); this format is namespace independent, we can call it u.v. (universal version) and it can give us a universal hash code
  2. An XSLT that can transform from v6 to uv (XSLT v6=>uv)
  3. An XSLT that can transform from v7 to uv (XSLT v7=>uv)

From v7 API, the following steps are required to compute the hash code (only point 2 is a really new implementation for the provider):

  1. generate of the IIA XML V7
  2. apply the transformation: XSLT v7 => uv
  3. compute the universal hash code for the transformed IIA
  4. insert the universal hash code (3) into the IIA generated at point 1

And these are the steps to compare the above universal hash code with the one we are currently storing with the approval:

  1. read our snapshot of the partner's approved IIA XML
  2. apply the transformation: XSLT v6 => uv
  3. compute the universal hash code for the transformed IIA snapshot
  4. compare this universal hash code with the one you received calling the partner's IIA Get API

Doing so, we can compare two hash codes that are independent from the namespace and from the elements that are in a version and not in the other.

Optimization suggested:

umesh-qs commented 1 year ago

if the xslt is updated with the version

I think this is the most important point: we need an XSLT per version

So I need to keep a log of which IIA was approved under which xslt version and based on that do the the hash calculation?

demilatof commented 1 year ago

if the xslt is updated with the version

I think this is the most important point: we need an XSLT per version

So I need to keep a log of which IIA was approved under which xslt version and based on that do the the hash calculation?

You should already keep the snapshot of the partner's IIA you approved, anyway it's coming mandatory. From the snapshot, you can know the IIA version and therefore what XSLT version you have to use to do the hash calculation. If you log the version explicitly, maybe you reduce the load on your system.

The XSLT per version should be provided by the maintainer of the xsd/xml.

umesh-qs commented 1 year ago

if the xslt is updated with the version

I think this is the most important point: we need an XSLT per version

So I need to keep a log of which IIA was approved under which xslt version and based on that do the the hash calculation?

You should already keep the snapshot of the partner's IIA you approved, anyway it's coming mandatory. From the snapshot, you can know the IIA version and therefore what XSLT version you have to use to do the hash calculation. If you log the version explicitly, maybe you reduce the load on your system.

The XSLT per version should be provided by the maintainer of the xsd/xml.

So every time I pull an IIA that was approved in v6 (or older version), I need to calculate the hash in v7, and if it is not same I need to go for re-approval?

umesh-qs commented 1 year ago

if the xslt is updated with the version

I think this is the most important point: we need an XSLT per version

So I need to keep a log of which IIA was approved under which xslt version and based on that do the the hash calculation?

You should already keep the snapshot of the partner's IIA you approved, anyway it's coming mandatory. From the snapshot, you can know the IIA version and therefore what XSLT version you have to use to do the hash calculation. If you log the version explicitly, maybe you reduce the load on your system.

The XSLT per version should be provided by the maintainer of the xsd/xml.

How is the XSLT solution different from what I proposed https://github.com/erasmus-without-paper/ewp-specs-api-iias/issues/109#issuecomment-1493348802

demilatof commented 1 year ago

So every time I pull an IIA that was approved in v6 (or older version), I need to calculate the hash in v7, and if it is not same I need to go for re-approval?

You need to calculate the hash in "universal version", a version that take care of everything and is neither a v7 or a v6. If it is not the same, you have to go for re-approval. Anyway, you don't need to calculate it every time, you could even implement a loop over all your approved snapshot, calculate the "universal hash code" after applying the XSLT and then save it in case of you need it later.

How is the XSLT solution different from what I proposed #109 (comment)

This solution is different because the XSLT takes care of the differences due to the new fields. The final version could be very similar to v7, but this is not necessary. The XSLT v6=>uv can:

  1. read IIA v6, if <mobilities-per-year> is not present, set it to 0 in the transformed version
  2. read IIA v6, get the first and the last <receiving-academic-year-id> and write them, in the transformed version, as <receiving-academic-year-start-id> and <receiving-academic-year-end-id>

Now we have only the problem of IIA v7 that has <mobilities-per-year> mandatory and "positiveInteger", that means that it cannot be zero. We have two possibilities:

  1. declare it nonNegativeInteger, so we accept zero
  2. specify that if it wasn't specified before and it is needed only for compatibility issues, we have to set it to a very large standard value such as "99999" (we could use INF but I think it could cause trouble)

In the first, case XSLT has nothing to transform, we have zero mobilities per year. In the second case, the XSLT knows that if it finds 99999 in the element <mobilities-per-year> it has to rewrite that value to zero (or we can choose that the XSLT v6=>UV sets it straightly to 99999 if the element was not present in the XML).

umesh-qs commented 1 year ago

This solution is different because the XSLT takes care of the differences due to the new fields.

Yes, but the concept remains same, which is to be able to calculate hash without namespace and excluding cosmetic changes. If this can be taken care by my internal logic, rather then XSLT, it is much more flexible. I can easily handle the "mobilities-per-year" example or the academic year sequence or any other special case that does not need re-approval for.

demilatof commented 1 year ago

This solution is different because the XSLT takes care of the differences due to the new fields.

Yes, but the concept remains same, which is to be able to calculate hash without namespace and excluding cosmetic changes. If this can be taken care by my internal logic, rather then XSLT, it is much more flexible. I can easily handle the "mobilities-per-year" example or the academic year sequence or any other special case that does not need re-approval for.

Not exactly the same, the XSLT is a standard way to ensure interoperability; all the providers receive the same XSLT and performs the same logic. If you use your internal logic, it could be better but not compatible with another internal logic.

umesh-qs commented 1 year ago

Not exactly the same, the XSLT is a standard way to ensure interoperability; all the providers receive the same XSLT and performs the same logic. If you use your internal logic, it could be better but not compatible with another internal logic.

I am not sure how far XLST can go in covering all the cases. If it cannot cover all the cases, then individual systems will still have to add custom logic. Also this only works when both partners are on the same version. How can the hash change be handled when partners are on different version?

janinamincer-daszkiewicz commented 1 year ago

How can the hash change be handled when partners are on different version? All partners will have to switch to the new version at app. the same time.

demilatof commented 1 year ago

I am not sure how far XLST can go in covering all the cases.

Very far, if we pay attention to list all the cases

How can the hash change be handled when partners are on different version?

As I said, every version has its XSLT that transforms an XML tied to a version into an XML that is independent from namespaces, that contains only the elements we need to compute the hash. Therefore, the hash is version independent.

But if two partners are on different versions, the problem is not the hash code, but the fact that they cannot exchange data each other. We could even go further, deciding to use an XSLT to allow IIA exchange between systems on different versions, but I think it could be easier if who has switched to version 7 doesn't remove the version 6 and keeps on using it when necessary.

All partners will have to switch to the new version at app. the same time.

Do you think it is really possible? Does this mean we have to wait until the last one will switch to the new version? And will we make a full switch to version 7, without testing it on development environment?

janinamincer-daszkiewicz commented 1 year ago

I suggest a solution in which the approving party would have to issue a new approval as soon as it detects that the owning party has switched to a new version of the IIA API. Most of the approvals could be updated automatically without the intervention of the approving IRO if the software of the approving party inspects the IIA GET response of the owning party and finds no business level changes in it.

How many providers would agree to such solution? We all have put a lot of effort looking for the solution which would allow us to avoid re-approval. Do we all agree that re-approval does not hurt?

P.S. The truth is that if there is no change, and partners do not send CNRs to each other, and partners do not do automatic IIA gets not triggered by CNRs, then the change in hash will not be discovered and re-approval will not be needed. If partners start modifying mutually approved IIA, the re-approval will be needed anyway or may be done (if they decide to revert back to the previous version).

demilatof commented 1 year ago

Do we all agree that re-approval does not hurt?

If we don't find a way to re-approve everything automatically, I don't think that our IROs would accept to manually re-approve all the IIAs, since they would have to check anyway if there is a change before sending the approval.

and partners do not do automatic IIA gets not triggered by CNRs

I think that most of the providers do automatic IIA gets; every one with his own timing, but it is the only way to not miss any IIA.