EMCECS / ecs-sync

ecs-sync is a bulk copy utility that can move data between various systems in parallel
Apache License 2.0
61 stars 22 forks source link

Bulk sync with verify option generates errors #44

Closed martva closed 5 years ago

martva commented 5 years ago

We have setup a esc sync to be run from an NFS export to an EMC ESC hardware appliance. Access has been setup correctly, because data is being written to the target. But right after the sync is started it generates java errors for each directory and subdirectories that is found in the nfs export.

When looking at what is written on the target, it has written all directories and files that are also on the source.

Error example:

"nfsserver.local:/volname_1/directory1/directory2/directory3","directory2/directory3/","true","0","","0","[com.emc.ecs.sync.storage.ObjectNotFoundException: directory2/directory3/] com.emc.ecs.sync.storage.ObjectNotFoundException: directory2/directory3/
    at com.emc.ecs.sync.storage.s3.EcsS3Storage.loadObject(EcsS3Storage.java:254)
    at com.emc.ecs.sync.storage.s3.AbstractS3Storage.loadObject(AbstractS3Storage.java:63)
    at com.emc.ecs.sync.storage.s3.EcsS3Storage.loadObject(EcsS3Storage.java:239)
    at com.emc.ecs.sync.TargetFilter.reverseFilter(TargetFilter.java:110)
    at com.emc.ecs.sync.SyncTask.run(SyncTask.java:128)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)"

The above error is displayed for very single directory and subdirectory. It seems the errors are only generated when the Verify or the Verify_only option is selected.

For recreating the error these are the settings used: Source: NFS Export Target: DELL/EMC ECS Hardware Appliance Bucket on the ECS has been created by hand before the sync.

newsync_1 newsync_2 newsync_3

twincitiesguy commented 5 years ago

S3 has no concept of directories. There is a notion of "common prefixes", but this is not the same as a directory. I.e a "common prefix" is not a real thing; it's just a prefix shared by multiple objects. Also, if you wish to rename a "common prefix" in a bucket, you have to rename all objects under that prefix recursively.

Because directories don't exist in S3, ecs-sync will skip any directories when writing to S3. This is why you see errors during verification. So one option you have is to ignore any verification errors for directories.

However, ecs-sync does give you the option to "preserve" directories from a filesystem by storing them as empty objects and their attributes as user-metadata. They will not behave as directories in S3 (they will just be ordinary objects), but this allows you to restore them back to the filesystem in the future (including empty directories). This will also eliminate these verification errors (because the directory objects will exist in the target bucket)