Closed husky-parul closed 6 years ago
At times the workers terminates with below errors but the status stays "complete". Under this circumstance when all workers reached complete status no new workers gets created. Result is not published to swift in this scenario. Is this expected behavior?
No we should let other workers pick this up.
@husky-parul This could be a flake. Can you close and re-open the PR to retrigger the test?
@danmcp @ravisantoshgudimetla
Added "emptyDir": {}
volume on worker pods. Worker that downloads data, creates a dir /cache/winner
. While uploading worker with /cache/winner
upload objects to Swift.
@danmcp @ravisantoshgudimetla
This PR looks done to me. Pod that downloaded from Swift waits for all image processing containers to finish. It then uploads data to Swift while other publish containers exit.
At this point openshift/pman-swift-publisher/watch.py
looks irrelevant. publish
container runs openshift/pman-swift-publisher/put_data.py
. Should I remove watch.py
?
Note: creating a different PR for job deletion
@ravisantoshgudimetla Any comments?
@ravisantoshgudimetla @danmcp what about watch.py?
what about watch.py?
@husky-parul Are you saying it's no longer needed? If so, I think it can be removed.
lockfile is deprecated but fastener was giving lock to all the workers. So, reverted to use lockfile again. Need to work on this.
fasteners.InterProcessLock('/tmp/tmp_lock_file')
At times the workers terminates with below errors but the status stays "complete". Under this circumstance when all workers reached complete status no new workers gets created. Result is not published to swift in this scenario. Is this expected behavior?
Error 1: while watching workers
Error 2: while connecting to swift