Open saschaszott opened 10 months ago
Hi @saschaszott
thanks for opening the issue, anyway this should be already resolved with https://github.com/4Science/DSpace/pull/276.
Feel free to re-open again the issue if is not working for you
Hi @atarix83 , the PR (276) you mentioned, does not fix the problem. We have integrated #276 into our code base and are able to reproduce the bug.
@atarix83 , the problem is raised in the pentaho transformation step named Select values 4. In this early transformation step the important sorting information in nested_object_id
and positiondef
is removed. We have fixed the Pentaho transformation (requires additional steps) locally. Let me know if you are interested in a PR.
@saschaszott
yes please open a PR when you can, so we can verify. Thanks
@atarix83 , sorry for the long pause, but today I was able to reproduce the problem described above with the latest version of DSC (2023.02.02). To illustrate the problem, I'll give you an example of a nested affiliation object (with 3 entries):
In the migrated RP you'll find an incorrect state
As you can see in the metadata full view, there is an uneven number of affiliation.startDate and affiliation.endDate fields
To better illustrate the change in the entity migration transformation, I'll provide a before-after comparison of the change in entity-migration.ktr
we propse:
You can find our proposed bugfix in PR #425 .
In DSC5 we are using nested objects to model affilations in researcher profiles. Each affiliation consists of 4 fields: org unit (ou pointer; mandatory), role (mandatory), start date (optional), end date (optional).
Currently, we have several affilations without start date and / or end date.
For example, in DSC5 we have one RP with 2 affiliations (screen shot)
Currently, the CRIS migration procedure (Pentaho transformation) inverts the order of affilations. This is due to step Sort position in
entity_migration.krt
(ascending = N)The expected migration result of the given example RP is:
Currently, DSC7 produces an invalid migration result in case of affiliations with optional fields (as in the given example):
In this example the assignment of end date is not correct.
This bug is caused by the Pentaho migration step Select values 4 which removes
nested_object_id
andpositiondef
in each row of the stream. This means that subsequent migration steps cannot determine the correct assignments of nested fields to a given affiliation (nested object).This bug affects the migration of RPs that have at least 2 affilations with missing nested fields.
We'll provide a bugfix (adaption of the Pentaho migration).