EMCECS / ecs-sync

ecs-sync is a bulk copy utility that can move data between various systems in parallel
Apache License 2.0
61 stars 22 forks source link

CAS Skipped files #52

Closed DEO5294 closed 5 years ago

DEO5294 commented 5 years ago

I am copying files from one CAS system to another. I have 32 millions files in the source. I have decided to break it down into 1 million chunks. During my first run on the 1st million items, I am getting 717 skipped. Is there an option or setting that I can output those 717 skipped items to a CSV file or txt file?

twincitiesguy commented 5 years ago

All objects that have been migrated using the specified DB table will be present in that table in mysql (MariaDB). If you are using the UI, you should be able to download a complete object report, which will include all objects migrated (over repeated runs using the same DB table). So it's important to use one DB table for each set of source/target pools.

If you're not using the UI, there is a process you can follow to extract the data from MariaDB.

The CSV will not show which objects were skipped, but you can see the transfer time of each object, so you could sort by that, and see which objects were migrated during which job runs (by looking at the times)

DEO5294 commented 5 years ago

I have learned to use the tool java -jar cas-clip-tool-1.2.jar. Then I compared the output with the CSV file I used for the transfer to the one created by the tool. I am getting somewhere. One of my co-workers discovered duplicate IDs that led to the skipped files. Since then we changed the process of how the CSV is created that I use for copying the data from one CAS system to another. It does not contain duplicates anymore. Since then I have not had skipped files.