fcrepo-exts / fcrepo-import-export

Apache License 2.0
15 stars 19 forks source link

parallelized export; added resume file #162

Closed pwinckles closed 3 years ago

pwinckles commented 3 years ago

Jira: https://fedora-repository.atlassian.net/browse/FCREPO-3742

What this does

  1. Parallelizes the export code
  2. Creates a new file that captures resource URIs that either failed to be processed or where waiting to be processed when the application was killed
  3. Allows exports to use the file created in step 2 to resume a failed export
  4. Adds a periodic progress report, every 10,000 resources, like the following: INFO 11:42:44.067 (Exporter) Progress report: Exported 520 resources in PT19.126342S at 1600461 bytes/sec

How should this be tested

  1. Setup a F5 instance with data
  2. Run the exporter as follows: java -jar fcrepo-import-export-1.1.0-SNAPSHOT.jar -b --dir my-5.1.1-export --user fedoraAdmin:fedoraAdmin --mode export --repositoryRoot http://localhost:8080/rest --resource http://localhost:8080/rest --binaries --versions
  3. Kill (ctrl+c) the utility while it's still exporting
  4. See the remaining_TIMESTAMP.log file in the current directory containing the URIs of the resources that were waiting to be exported when the export was killed
  5. Run the exporter again as follows: java -jar fcrepo-import-export-1.1.0-SNAPSHOT.jar -b --dir my-5.1.1-export --user fedoraAdmin:fedoraAdmin --mode export --repositoryRoot http://localhost:8080/rest --resourcesFile remaining_TIMESTAMP.log --binaries --versions
  6. See the exporter pick up where it left off

Additionally, the number of threads the exporter used can be adjusted using the -T option.