fcrepo-exts / fcrepo-import-export

Apache License 2.0
15 stars 19 forks source link

Import serialized BagIt bags #140

Closed mikejritter closed 4 years ago

mikejritter commented 4 years ago

Resolves: https://jira.lyrasis.org/browse/FCREPO-3205

Adds classes to deserialize zip, tar, and gzip BagIt bags


New Dependencies

Notes


Testing

  1. Start a fresh fedora repository
  2. Create a binary
  3. Create a bag-config.yml for export
    bag-info.txt:
    Source-Organization: fcrepo-import-export
    Organization-Address: localhost
    External-Description: Sample bag export for fcrepo-3205
  4. Run the exporter
    java -jar target/fcrepo-import-export-0.4.0-SNAPSHOT.jar --mode export --resource http://localhost:8080/fcrepo/rest --dir fcrepo-3205 --binaries --bag-profile default --bag-config bag-config.yml --user fedoraAdmin:secret3
  5. Serialize the exported bag in your format of choice
    zip -r fcrepo-3205.zip fcrepo-3205
    tar cf fcrepo-3205.tar fcrepo-3205
    tar czf fcrepo-3205.tar.gz fcrepo-3205
  6. Remove the deserialized bag
    rm -r fcrepo-3205
  7. Start a fresh fedora repository
  8. Run the importer on the serialized bag
    java -Dfcrepo.log.importexport=DEBUG -jar target/fcrepo-import-export-0.4.0-SNAPSHOT.jar --mode import -r http://localhost:8080/fcrepo/rest --dir fcrepo-3205.tar --binaries -g default -G bag-config.yml -u fedoraAdmin:secret3
dbernstein commented 4 years ago

@mikejritter : If it isn't too much trouble I think it is probably a good idea to do this before I test it:

Finally... when the filename in the archive do not match the filename of the archive (e.g. test-bag/ and sample-bag.tar), the BagDeserializers will return an incorrect path. This can be fixed before the PR is merged if desired.

mikejritter commented 4 years ago

@dbernstein I think there are a few additional updates for this for all the tests to pass from the changes #139 brought in. I'll get changes for that tomorrow in addition to seeing if I can get in a fix for the issues you commented on.

mikejritter commented 4 years ago

@dbernstein I made adjustments according to your comments and also updated the BagDeserializers to prioritize the name of the path in the archive. I opted to have it fallback to the filename minus the extension if one is not found, which felt like a sane default.

The BagDeserializerTest was also updated to check for this mismatch, and while I was updating it I made it a Parameterized test since all the tests were the same.

dbernstein commented 4 years ago

@mikejritter : I ran the test, but was running into the following error (after modifying the tar commands - I changed them from tar xf ... to tar cf ... and tar xzf ... to tar czf ...: Here's the the tar import: (tar.gz file also failed for similar reasons).

Daniels-MBP$ java -Dfcrepo.log.importexport=DEBUG -jar ~/code/fcrepo-import-export/target/fcrepo-import-export-0.4.0-SNAPSHOT.jar --mode import -r http://localhost:8080/rest --dir fcrepo-3205.tar.gz --binaries -g default -G bag-config.yml -u fedoraAdmin:fedoraAdmin DEBUG 21:56:43.704 (ArgParser) Command line argments weren't valid for specifying a config file. DEBUG 21:56:44.114 (SerializationSupport) /Users/danielbernstein/tmp/btr/fcrepo-3205.tar.gz: application/gzip ERROR 21:56:44.119 (ImportExportDriver) Error performing import/export: BagProfile does not allow application/gzip. Accepted serializations are: application/tar DEBUG 21:56:44.122 (ImportExportDriver) Stacktrace: java.lang.RuntimeException: BagProfile does not allow application/gzip. Accepted serializations are: application/tar at org.fcrepo.importexport.common.SerializationSupport.deserializerFor(SerializationSupport.java:132) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] at org.fcrepo.importexport.importer.Importer.loadBagProfile(Importer.java:193) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] at org.fcrepo.importexport.importer.Importer.(Importer.java:170) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] at org.fcrepo.importexport.ArgParser.parse(ArgParser.java:448) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] at org.fcrepo.importexport.ImportExportDriver.run(ImportExportDriver.java:57) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] at org.fcrepo.importexport.ImportExportDriver.main(ImportExportDriver.java:47) ~[fcrepo-import-export-0.4.0-SNAPSHOT.jar:na] Daniels-MBP$

mikejritter commented 4 years ago

@dbernstein there are two things I think we can do for this:

Otherwise I think I had been testing with the beyondtherepository profile, but I do think it would be good to at least update the default profile with more supported types.

dbernstein commented 4 years ago

@mikejritter : were you going to add all the supported types that you had in beyond the repository to the default? ie

"application/x-tar",
"application/x-gzip",
"application/x-7z-compressed"
dbernstein commented 4 years ago

Once that's in I'll merge it (as I tested successfully with the btr profile.

mikejritter commented 4 years ago

@dbernstein I updated the default profile to have all serialization formats, it should be good to go. I have a note for myself about further coercion for potentially doing x-tar -> tar, and I think that's something which should stay as a note for now since the profile has the content types enumerated.