erasmus-without-paper / ewp-specs-api-registry

Specifications of EWP's Registry API.
MIT License
0 stars 2 forks source link

Same Erasmus Code, different Hei id #10

Closed demilatof closed 1 year ago

demilatof commented 2 years ago

I don't know if this is the right place to open this issue, anyway I try. In our old system we work with Erasmus Codes to identify the partners. In order to associate them to the Heis I'm reading the registry https://dev-registry.erasmuswithoutpaper.eu/catalogue-v1.xml to find the right Hei by its other-id type="erasmus" .

Today pop out the same erasmus code associated to multiple hei.

I'm aware that its a development registry, but is this scenario possible? Should I manage this evenience (contacting the partner and manually choosing the right hei_id) or the maintainers should reject a new different hei with the same erasmus code (without the "previous-schac" element filled in)?

Or do you suggest a different way to identify a partner's hei-id by its erasmus code?

janinamincer-daszkiewicz commented 2 years ago

The only global identifier in the EWP Network is SCHAC code. We advice that in your database you add SCHAC codes for all your partners. We do exactly that in our system. SCHAC codes are mandatory, any other identifiers are optional. Do not base your solution on Erasmus codes.

In theory each HEI should have one Erasmus code. However there are some cases where inside one institution there are two different installations of the mobility module from different providers, they have different SCHACs but the same Erasmus code. We do hope to resolve such issues in the future but generally it is not easy.

There is a lot of bad data in the DEV Network. It would be great if HEIs would care about the quality of data not only in PROD but also in DEV - that would make testing easier.

demilatof commented 2 years ago

Many thanks for your answer. The main problem is that switching from Erasmus codes to SHAC requires a full rewriting of the whole system. What I'm doing now is binding an Erasmus code to its SHAC, but since we have more than 300 agreements, I would like to automatize the process without involving an human operator.

Moreover, until now our agreements have always been signed between ErasmusCode A and ErasmusCode B. It's quite impossible migrating the current data if ErasmusCode B can be bound both to HEI B1 and HEI B2. If this could happen, the migration will require to contact the other HEIs involved to identify the right one.

I thought that could not exists a scenario where there are some cases where inside one institution there are two different installations of the mobility module from different providers. Even if there were different providers, they should use the same SCHAC (and the same Erasmus Code) with different domains in the API URLs.

As matter of fact we have a second problem, because we have agreements with different institutions of the same Erasmus code, but this should be resolved by means of different OUNIT-IDs of the same HEI-ID, not with multiple SCHAC.

Since the global identifier in EWP is SCHAC code, do you mean that we could avoid any reference to Erasmus code?

I agree with you that HEIs should care about the quality of data also in DEV; I think even that it would be a great improvement if there was a scheduled job that checks the consistence of some data.

janinamincer-daszkiewicz commented 2 years ago

We hope that at some moment we will get access to the official ECHE list with SCHAC codes included and will make it available via a public API, but again, this is not an easy task. Anyway, we have it on our radar.

We will try to gather cases where duplicate Erasmus codes might show in the network for a good reason. Let's hope that there are not many and can be resolved somehow.

Since the global identifier in EWP is SCHAC code, do you mean that we could avoid any reference to Erasmus code?

In our system we do not rely on Erasmus codes in EWP exchanges.

I agree with you that HEIs should care about the quality of data also in DEV; I think even that it would be a great improvement if there was a scheduled job that checks the consistence of some data.

Finding inconsistences is one thing, have them corrected - another ;)

demilatof commented 2 years ago

Thanks again for your answer.

In our system we do not rely on Erasmus codes in EWP exchanges.

Our system is more than ten years old, the Erasmus code was the core; we can "forget" the ended agreements, but those that are still in effect were stored with that system. Presently the most urgent problem is connecting to EWP network; in future we could develop a new system SCHAC based, but not now. I think that this could be a common problem for all the HEIs that have their own in house system. Maybe the problem could not be the Erasmus code, but I think that no one has an old system SCHAC based.

janinamincer-daszkiewicz commented 2 years ago

Our system is 20 years old, the mobility module has been built may be 10 years ago (I do not remember exactly). Erasmus code is one of the attributes of the partner HEI but not a primary key.

janinamincer-daszkiewicz commented 2 years ago

the mobility module has been built may be 10 years ago (I do not remember exactly) First version has been built in 2006.

demilatof commented 2 years ago

Our system is 20 years old, the mobility module has been built may be 10 years ago (I do not remember exactly). Erasmus code is one of the attributes of the partner HEI but not a primary key.

Whatever is the primary key, I don't think it was the SCHAC, it seems to me that it is a quite new identifier. The main problem is the number of agreements to bind to the right SCHAC.

janinamincer-daszkiewicz commented 2 years ago

Right, primary key is different from SCHAC, SCHAC has been added recently, when we started implementing EWP connector. Agreements are connected to HEIs, HEIs have SCHACs.

demilatof commented 2 years ago

Agreements are connected to HEIs, HEIs have SCHACs.Indeed...

I think that your old system had its own HEIs registry, with a name and attribues that were somehow different from the current HEIs name and attributes. May I ask you how your team mapped the old HEI identifier with the new one (and therefore with SCHAC)?

Without a common and unique attribute that exists in both new and old registry, I suppose the only way is the human operator

janinamincer-daszkiewicz commented 2 years ago

I think that your old system had its own HEIs registry, with a name and attribues that were somehow different from the current HEIs name and attributes.

The old system and the current system are the same. We added new attributes, SCHAC among them.

May I ask you how your team mapped the old HEI identifier with the new one (and therefore with SCHAC)?

By hand. Whenever new HEI becomes a partner of the home HEI, it is added to the system with SCHAC, or SCHAC is added to the existing record.

Without a common and unique attribute that exists in both new and old registry, I suppose the only way is the human operator.

Correct, but this is not a problem, data are added successively, as needed.

demilatof commented 2 years ago

Many thanks for your confirm. We have more than 200 Erasmus code to associate by hand to the right HEI. The unique Erasmus code would have been really useful.

janinamincer-daszkiewicz commented 2 years ago

I can send you data from our database, if that might be of use.

demilatof commented 2 years ago

Many thanks for your kindness, but I don't think it could be of use. I try to automatic map the most I can, than the remain agreements will be mapped by hand

georgschermann commented 2 years ago

i think automatic mapping may be crucial, since schacs / erasmus codes / etc. can change continuously in the registry. we automatically refresh HEI data on a daily basis with the priority schac > erasmus code > pic, so when the schac of a HEI is found in the network all is good, if no schac is set at the hei or the schac is not present in the network any more it is looked up by the erasmus code, if multipe schacs are found the HEI remains in a conflicting state which needs to be manually resolved befor the next data exchange. Works pretty reliable for 100k+partner HEIs in total.

umesh-qs commented 2 years ago

We do the same as mentioned by @georgschermann . I am not sure why is this even a question. Erasmus and other codes were supposed to be unique. Has anything changed recently?

demilatof commented 2 years ago

I am not sure why is this even a question. Erasmus and other codes were supposed to be unique. Has anything changed recently?

The question arises from my trying to map our data (erasmus code based) to SCHAC. My system logged multiple HEIs for the same Erasmus Code, therefore I asked for a clarification: should I assume that a Erasmus Code is unique, with only a Hei ID associated, or should I manage multiple occurence of different Hei IDs for the same Erasmus Code? Ok, I'm working with developer registry, but this is the place where I have to make up my mind. The attachment is an example of what I find in the dev-registry

madrid04

janinamincer-daszkiewicz commented 1 year ago

I guess you have solved the issue some time ago. There is a new functionality coming, first in DEV: https://ewp.demo.usos.edu.pl/stats/issues?type=non-unique-ids&attr=erasmus. This error comes from the ECHE list, not the network, compare with: https://eche-list.erasmuswithoutpaper.eu/report/

umesh-qs commented 1 year ago

I guess you have solved the issue some time ago. There is a new functionality coming, first in DEV: https://ewp.demo.usos.edu.pl/stats/issues?type=non-unique-ids&attr=erasmus. This error comes from the ECHE list, not the network, compare with: https://eche-list.erasmuswithoutpaper.eu/report/

@janinamincer-daszkiewicz so duplicates on Erasmus, PIC, ECHE etc are not allowed?

janinamincer-daszkiewicz commented 1 year ago

Each and every case requires a detailed explanation, these identifiers are issued by the Comission.

umesh-qs commented 1 year ago

Each and every case requires a detailed explanation, these identifiers are issued by the Comission.

Let me rephrase. Is duplicates in Erasmus, PIC etc allowed in the registry after a detailed explanation?

janinamincer-daszkiewicz commented 1 year ago

When the situation is cleared there will not be a duplicate in the registry because institutions will correct errors.

umesh-qs commented 1 year ago

When the situation is cleared there will not be a duplicate in the registry because institutions will correct errors.

So DG EAC has decided to allow duplicates, until there is an explanation and remove the duplicates manually on getting an explanation. Interesting process. What is the timeframe for getting explanation and then removing the duplicates?

janinamincer-daszkiewicz commented 1 year ago

What is the timeframe for getting explanation and then removing the duplicates?

As soon as possible

umesh-qs commented 1 year ago

What is the timeframe for getting explanation and then removing the duplicates?

As soon as possible

One more interesting process. I hope DG EAC goes through these discussion as they say they do. What does ASAP means? What is then max time limit? What if the institution/providers that is asked for explanation, lets say takes a months time. Duplicates will remain in the registry for months?

demilatof commented 1 year ago

I guess you have solved the issue some time ago. There is a new functionality coming, first in DEV: https://ewp.demo.usos.edu.pl/stats/issues?type=non-unique-ids&attr=erasmus. This error comes from the ECHE list, not the network, compare with: https://eche-list.erasmuswithoutpaper.eu/report/

Thanks, I think this is a useful tool.

@janinamincer-daszkiewicz so duplicates on Erasmus, PIC, ECHE etc are not allowed?

I hope that no duplicates will be allowed; anyway this involves that if an HEI change its hei_id it should switch off the old hei_id and switch_on the new hei_id in a very short time (minutes, not hours)

umesh-qs commented 1 year ago

I guess you have solved the issue some time ago. There is a new functionality coming, first in DEV: https://ewp.demo.usos.edu.pl/stats/issues?type=non-unique-ids&attr=erasmus. This error comes from the ECHE list, not the network, compare with: https://eche-list.erasmuswithoutpaper.eu/report/

Thanks, I think this is a useful tool.

@janinamincer-daszkiewicz so duplicates on Erasmus, PIC, ECHE etc are not allowed?

I hope that no duplicates will be allowed; anyway this involves that if an HEI change its hei_id it should switch off the old hei_id and switch_on the new hei_id in a very short time (minutes, not hours)

Ideally this should be done at the time of manifest URL addition/activation in registry portal. But for some reason DG EAC has chosen to handle it manually. As per Janina duplicates will be removed ASAP (not in few hours for sure), depending on what clarification/justification the provider has for adding duplicates.

janinamincer-daszkiewicz commented 1 year ago

Duplicates in he-ids, the identifiers in the EWP network, are resolved by the registry.

demilatof commented 1 year ago

Duplicates in he-ids, the identifiers in the EWP network, are resolved by the registry.

What I mean is that if an HEI change only its hei_id from "oldHei.id" to "newHei.id" and it keeps all the other identifiers (Erasmus code, PIC, OID, and so on) there are no duplicates in hei_id, but the HEI cannot be listed twice, for example to manage the oldest IIAs and the new IIAs with two different systems (same provider or not)

janinamincer-daszkiewicz commented 1 year ago

What I mean is that if an HEI change only its hei_id from "oldHei.id" to "newHei.id" and it keeps all the other identifiers (Erasmus code, PIC, OID, and so on) there are no duplicates in hei_id, but the HEI cannot be listed twice, for example to manage the oldest IIAs and the new IIAs with two different systems (same provider or not)

That's true.