14% failure rate with -10014 errors on ECS but ECS-Sync shows no failures at all

EMCECS / ecs-sync

ecs-sync is a bulk copy utility that can move data between various systems in parallel

Apache License 2.0

61 stars 22 forks source link

14% failure rate with -10014 errors on ECS but ECS-Sync shows no failures at all #30

Closed holgerjakob closed 6 years ago

holgerjakob commented 6 years ago

Hi all We've got a cas to cas job running for close to 7 million clips in a cliplist. The job validates on the ECS target too. Steady state of some 20 Objects a second without any errors. The ECS UI shows 14% of failures all with -10014 (FP_FILE_NOT_STORED_ERR) errors. As the validation also shows no errors I wonder if there is a good explantion for these errors and why ecs-sync does not show any objects as erroring out.

Thanks, Holger

twincitiesguy commented 6 years ago

ecs-sync has a retry mechanism that will retry most errors up to twice (3 total attempts). You can look at the retry queue (objects awaiting retry) to see if there are any retries occurring.

So it’s quite possible that both statistics are correct. Some objects could be failing on the initial attempt, but are copied successfully on retry.

holgerjakob commented 6 years ago

Hi Stu The retry queue is always at 0 object, also the retry list that can be generated contains no entries at all. Holger

twincitiesguy commented 6 years ago

Are there any other applications connecting over CAS? If you change the log level to Verbose temporarily, you should see confirmations of each object attempt in the log (/var/log/ecs-sync/ecs-sync.log). That will also show whether there are any retries. If ecs-sync doesn’t see any errors, it’s just a matter of tracking down where they are coming from.

holgerjakob commented 6 years ago

Currently there is only ECS Sync connected and accessing ECS. I will raise the log level and have a look at it. Will keep you posted

holgerjakob commented 6 years ago

I see a lot of blob exists on target, skipping write. These will probably result in the clip not fully being written and the cas sdk error on the ECS Side. Thanks a lot for the assistance, Holger