4Science / DSpace

This repository contains the 4Science optimized DSpace & DSpace-CRIS distribution.
https://wiki.lyrasis.org/display/DSPACECRIS/
BSD 3-Clause "New" or "Revised" License
42 stars 61 forks source link

XML parsing error due to extraneous </sql> tag in update_dc_date_accessioned.ktr without corresponding opening tag #420

Open lehmbein opened 7 months ago

lehmbein commented 7 months ago

Describe the bug The second/final step for the database migration fails because an <sql>-tag is missing in the update_dc_date_accessioned.ktr. At update_dc_date_accessioned.ktr#L574 there is a closing </sql>-tag (without the starting tag).

To Reproduce Steps to reproduce the behavior:

  1. Follow the migration steps described in the documentation in chapter Data migration from DSpace-CRIS 5.
  2. Start the second Pentaho job for the migration with the dspace_cris_migration_post_import.kjb
  3. This immediately produces the following errors ( shortened):
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
[Fatal Error] :574:4: The element type "step" must be terminated by the matching end-tag "</step>".
2024/02/02 13:44:00 - UPDATE dc_date_accessioned - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Unable to run job dspace_cris_migration_post_import. The UPDATE dc_date_accessioned has an error. Unable to read file [file:///srv/dspace/dspace-cris-git-backend/dspace/etc/migration/update_dc_date_accessioned.ktr].
2024/02/02 13:44:00 - UPDATE dc_date_accessioned -
2024/02/02 13:44:00 - UPDATE dc_date_accessioned - Error reading information from input stream
2024/02/02 13:44:00 - UPDATE dc_date_accessioned - The element type "step" must be terminated by the matching end-tag "</step>".
2024/02/02 13:44:00 - UPDATE dc_date_accessioned - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : org.pentaho.di.core.exception.KettleXMLException:
2024/02/02 13:44:00 - UPDATE dc_date_accessioned - Unable to read file [file:///srv/dspace/dspace-cris-git-backend/dspace/etc/migration/update_dc_date_accessioned.ktr].
...

Expected behavior dspace_cris_migration_post_import.kjb should be able to use the update_dc_date_accessioned.ktr correctly. A simple addition to <sql> in update_dc_date_accessioned.ktr#L516 should solve the problem.

Related work Nothing I am aware of.