4Science / DSpace

This repository contains the 4Science optimized DSpace & DSpace-CRIS distribution.
https://wiki.lyrasis.org/display/DSPACECRIS/
BSD 3-Clause "New" or "Revised" License
42 stars 61 forks source link

Data migration from DSpace-CRIS 5 #404

Open alejandratenorio opened 9 months ago

alejandratenorio commented 9 months ago

Describe the bug

Dear 4Science Team,

We are working on upgrading our DSpace Cris instance from version base on DSpace 5.10 to DSpace-CRIS 7 release 2023.01.01. URL: CRIS Cris Versión: 5.10 image

We need to upgrade our database using your Data migration from DSpace-CRIS 5 process described on your documentation but now the tools (Pentaho Data Integration) are not available.

image

Please, could you help us? Where can we download it?

Thank you in advance.

To Reproduce Steps to reproduce the behavior:

  1. Go to Pentaho Data Integration
  2. Then we got this message: file could not be found or is not available.
kskaiser commented 9 months ago

You can download the tool from https://www.hitachivantara.com/en-us/products/pentaho-platform/data-integration-analytics/pentaho-community-edition.html If you want to run it on an ARM Mac, you have to follow this guide: https://stackoverflow.com/questions/67972804/pentaho-data-integration-not-starting-on-new-mac-m1 Windows and Linux should be fine.

alejandratenorio commented 9 months ago

@kskaiser Thank you so much. There are a lot of options, could you tell me which tool I have to download?

kskaiser commented 9 months ago

@alejandratenorio It's the "Pentaho Data Integration (Base Install)". https://privatefilesbucket-community-edition.s3.us-west-2.amazonaws.com/9.4.0.0-343/ce/client-tools/pdi-ce-9.4.0.0-343.zip

alejandratenorio commented 9 months ago

Hi @kskaiser Thank you very much again. I have run the kitchen script with the following parameters:

kitchen.sh -file:/home/dspace/DSpace-dspace-cris-2023.01.01/dspace/etc/migration/dspace_cris_migration.kjb -param:db_host_name=localhost -param:db_name=dspace -param:db_port_number=5432 -param:db_username=dspace -param:db_password=dspace. -param:eperson_email=mymails@...

but I got this error:

2023/11/30 04:13:27 - insert into imp_record.0 - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Because of an error, this step can't continue: 2023/11/30 04:13:27 - metadata visibility configuration.0 - Finished processing (I=2, O=0, R=0, W=2, U=0, E=0) 2023/11/30 04:13:27 - Join rows.0 - Finished processing (I=0, O=0, R=71, W=70, U=0, E=0) 2023/11/30 04:13:27 - Add insert operation and status.0 - Finished processing (I=0, O=0, R=70, W=70, U=0, E=0) 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : org.pentaho.di.core.exception.KettleException: 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting row into table [imp_record] with values: [null], [Funding], [30], [pj00030], [null], [1], [insert], [z], [1], [1c5344cd-fa89-44d8-9c98-7aa20de1d75 0] 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting/updating row 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:384) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:125) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2023/11/30 04:13:27 - insert into imp_record.0 - at java.base/java.lang.Thread.run(Thread.java:829) 2023/11/30 04:13:27 - insert into imp_record.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 2023/11/30 04:13:27 - insert into imp_record.0 - Error inserting/updating row 2023/11/30 04:13:27 - insert into imp_record.0 - ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1335) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:262) 2023/11/30 04:13:27 - insert into imp_record.0 - ... 3 more 2023/11/30 04:13:27 - insert into imp_record.0 - Caused by: org.postgresql.util.PSQLException: ERROR: null value in column "imp_collection_uuid" of relation "imp_record" violates not-null constraint 2023/11/30 04:13:27 - insert into imp_record.0 - Detail: Failing row contains (1, pj00030, 1c5344cd-fa89-44d8-9c98-7aa20de1d750, null, z, insert, null, null, null). 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2552) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2284) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:322) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130) 2023/11/30 04:13:27 - insert into imp_record.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1302) 2023/11/30 04:13:27 - insert into imp_record.0 - ... 4 more 2023/11/30 04:13:27 - Rename to metadata_visibility.0 - Finished processing (I=0, O=0, R=2, W=2, U=0, E=0) 2023/11/30 04:13:27 - Add imp_record_id.0 - Finished processing (I=0, O=0, R=13, W=13, U=0, E=0) 2023/11/30 04:13:27 - Get variables.0 - Finished processing (I=0, O=0, R=13, W=13, U=0, E=0) 2023/11/30 04:13:27 - insert into imp_record.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1) 2023/11/30 04:13:27 - entity_migration - Transformation detected one or more steps with errors. 2023/11/30 04:13:27 - entity_migration - Transformation is killing the other steps! 2023/11/30 04:13:27 - orcid authentication configuration.0 - Finished processing (I=1, O=0, R=0, W=0, U=0, E=0) 2023/11/30 04:13:27 - Placeholder Var.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=0) 2023/11/30 04:13:27 - entity_migration - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Errors detected! 2023/11/30 04:13:28 - orcid scopes configuration.0 - Finished processing (I=1, O=0, R=0, W=0, U=0, E=0) 2023/11/30 04:13:28 - Join rows 6.0 - Finished processing (I=0, O=0, R=27, W=0, U=0, E=0) 2023/11/30 04:13:28 - entities nested placeholder.0 - Finished reading query, closing connection 2023/11/30 04:13:30 - entity_migration - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Errors detected! 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [funding migration] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [funding migration setup] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [set funding variables] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [publications migration] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Finished job entry [set database variables] (result=[false]) 2023/11/30 04:13:30 - dspace_cris_migration - Job execution finished 2023/11/30 04:13:30 - Kitchen - Finished! 2023/11/30 04:13:30 - Kitchen - ERROR (version 9.4.0.0-343, build 0.0 from 2022-11-08 07.50.27 by buildguy) : Finished with errors 2023/11/30 04:13:30 - Kitchen - Start=2023/11/30 04:13:22.422, Stop=2023/11/30 04:13:30.044 2023/11/30 04:13:30 - Kitchen - Processing ended after 7 seconds.

image

Has this ever happened to you?

kskaiser commented 9 months ago

Yes, indeed. That happened also to me ;) You have to edit the migration_configuration.xls Excel file. On the first tab ("collections"), you have to enter the UUIDs of the collections you have generated in a previous step. You must also edit the other tabs in the file, but for the first run you can try to leave it.

Carefully read the PDF documentation. Not everything is mentioned on the place you'd expect them to be :( For the Pentaho part: Run the "spoon.sh" (or "spoon.bat") script to have the GUI opened. There you can load the "dspace_cris_migration.kjb" file and edit the parameters when you run the script. You can see at which step something goes wrong ;)

Be aware that between the "dspace_cris_migration.kjb" script and the "dspace_cris_migration_post_import.kjb" you have to run the "Import Script": dspace dsrun org.dspace.app.batch.ItemImportMainOA -E <eperson-email>

Good luck!

alejandratenorio commented 9 months ago

Hi @kskaiser

Thank you so much, I have filled in the excel file, and I ran the _dspace_crismigration.kjb, dsrun org.dspace.app.batch.ItemImportMainOA and _dspace_cris_migration_postimport.kjb everything goes well. However, when I ran dspace update-item-references, I got this message:

image

I think I should configure my relationship, shouldn't it?

Another question, data of the entities was migrated to the new collections, but their relationships were not migrated. Am I skipping a step?

image

Thank you so much.