fcrepo-exts / fcrepo-import-export

Apache License 2.0
15 stars 19 forks source link

fcrepo-import-export-0.2.0.jar import function not importing all records #113

Open straccers opened 6 years ago

straccers commented 6 years ago

trying to use the standalone import-export utility (fcrepo-import-export-0.2.0.jar) to migrate data between two machines. Both machines use the same version of fedora (4.7.5). The export appears to complete correctly, without any reported errors, and all the expected data appears to be present in the ttl files. However when running the import, the output shows multiple errors, - this is the first one in the output log for an attempt to import the entire repository ERROR 15:24:04.094 (Importer) Error while importing /home/dlib/thursday_22_03_2018_11-27am_oasis_export/fedora/rest/oasis/d9/4a/ec/2f/ d94aec2f-ad6c-47a0-87bb-36fd88ff4f5f.ttl (412): Request failed due to unspecified failed precondition.

(as you can see, its not very informative)

and also quite a few errors of the type WARN 15:24:05.012 (Importer) Skipping Membership Resource: http://localhost:8080/fedora/rest/oasis/57/12/m6/52/5712m6524

The import runs to completion but and does not populate the records with all of the expected data elements, although when we look in the exported turtle files these are clearly present.

import export version: fcrepo-import-export-0.2.0.jar fedora version: fcrepo-import-export-0.2.0.jar (on both machines) and we have set -Dfcrepo.properties.management=relaxed on the machine with the fedora repository we are trying to import into.

awoods commented 6 years ago

@bbpennel : Are you in a position to investigate this?

bbpennel commented 6 years ago

@straccers I will check to see if I can replicate the issue later. Would you be able to share the parameters or config file you were using for the import and/or original export? And also, if possible, an example .ttl file or few that triggered the 412 response on import?

A few other questions that may help diagnosis the issue: Were all the objects present after the import, just not fully populated? Was there a pattern to which properties were not populated? And was the import into an empty repository?

bbpennel commented 6 years ago

After some testing, I am wondering if you are reimporting objects that are already in the destination fedora instance, or if this is going into an empty fedora instance?

When importing an object, the import tool provides the current time as the if-unmodified-since header so that fedora will reject it if another client modified the same object while the import was taking place. Normally this shouldn't block you from reimporting the same object, but if the object in the repository has a last modified date more recent than the current time, the update will be rejected. This could possibly happen if objects had been previously imported with timestamps in the future OR potentially during the same import if the system clocks between the client and the server disagree.

I was able to replicate the 412 response by importing an object into fedora with a future last modified date, and then importing it again. My test involved fcrepo-4.7.5 for both servers (using the embedded jetty distribution) and the -Dfcrepo.properties.management=relaxed property set.

Also, the "Skipping Membership Resource" are normal (despite the "WARNING" level), you will see those for any container that uses a ldp:DirectContainer or ldp:IndirectContainer to populate membership relations. We should most likely change that to INFO or DEBUG.

straccers commented 6 years ago

Hi Ben thank you for looking into this, we have tried this import with various different switches, perhaps it would be best if we were to wipe the repository we are trying to import into completely clean and make a fresh attempt, is there a recommended way to do this? Is simply deleting the existing fcrepo-data directory and allowing fedora to create a new one sufficient to acheive this? We had assumed this was the case

Regards Peri Stracchino University of York Digital Library Technical Team

On 2 April 2018 at 20:45, Ben Pennell notifications@github.com wrote:

After some testing, I am wondering if you are reimporting objects that are already in the destination fedora instance, or if this is going into an empty fedora instance?

When importing an object, the import tool provides the current time as the if-unmodified-since header so that fedora will reject it if another client modified the same object while the import was taking place. Normally this shouldn't block you from reimporting the same object, but if the object in the repository has a last modified date more recent than the current time, the update will be rejected. This could possibly happen if objects had been previously imported with timestamps in the future OR potentially during the same import if the system clocks between the client and the server disagree.

I was able to replicate the 412 response by importing an object into fedora with a future last modified date, and then importing it again. My test involved fcrepo-4.7.5 for both servers (using the embedded jetty distribution) and the -Dfcrepo.properties.management=relaxed property set.

Also, the "Skipping Membership Resource" are normal (despite the "WARNING" level), you will see those for any container that uses a ldp:DirectContainer or ldp:IndirectContainer to populate membership relations. We should most likely change that to INFO or DEBUG.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fcrepo4-labs/fcrepo-import-export/issues/113#issuecomment-378023158, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCpNMa0Sg5KJAcCdKb6vOnpKYGeH2qWks5tkn_ZgaJpZM4S49w5 .

straccers commented 6 years ago

right, apologies for the huge gap here - we had a live system issue to deal with. So looking into this some more, I can confirm we ARE starting from a fresh repository each time. The predicate the import seems to be reporting as an error each time is fedora:createdBy in the fcr:metadata file . example of one such turtle file below (references to our own server names changed) . in this case i am exporting and importing with the -inward, -external and -binary flags, however this also happens with just the -binary flag . Its a tad puzzling, as the same predicate appears in the other ttl files too - but the error is only reported for the fcr:metadata

export config : mode: export external: true legacyMode: false predicates: http://www.w3.org/ns/ldp#contains overwriteTombstones: false auditLog: false resource: http://:8080/fedora/rest inbound: true versions: false dir: /home/dlib/oasisbackup_ei_09042018_1500 binaries: true rdfLang: text/turtle

import config: (I didnt use legacy mode this time as I wanted to get the error output) legacyMode: false overwriteTombstones: false auditLog: false resource: http://localhost:8080/fedora/rest inbound: true dir: /home/dlib/oasisbackup_ei_09042018_1500 rdfLang: text/turtle mode: import external: true predicates: http://www.w3.org/ns/ldp#contains versions: false map: http://:8080/fedora/rest,http://localhost:8080/fedora/rest binaries: true

straccers commented 6 years ago

whoops heres the zipped ttl file folder fcr%253Ametadata_REDACTED.zip

birkland commented 6 years ago

I ran into the same issue today, with random Request failed due to unspecified failed precondition errors on an fcrepo 4.7.5 with the latest 0.2.0 import/export tool. We were loading only rdf resources, no binaries or external content. Some observations:

Ultimately, under extreme time pressure, I had to remove the .ifUnmodifiedSince(currentTimestamp()) option from the PutBuilder in Importer in order to get a clean load.