Closed ghost closed 6 years ago
Thanks for reporting this.
Can you give a bit more detail on this please? Is one epoch running fine, then this appears on the beginning of the next one? Can you try with just one worker to see whether it works or you get this error only once? What happens after the error? Does it freeze? Does it crash? I am confused by the lack of output after these error messages. Can you pinpoint where this error occurs in my code? Maybe by inserting some print statements at the important steps (before and after stop_workers for example)?
Are you using the Python version and packages as indicated in the Readme (Python2 and packages from requirements.txt)?
It was because of server. Sorry for confusing. Thank you!
EPOCH: 1 Starting worker 2018-10-21 14:50:57.740707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-10-21 14:50:57.740741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-10-21 14:50:57.740747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-10-21 14:50:57.740751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-10-21 14:50:57.740864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10417 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1) terminate called after throwing an instance of 'std::system_error' what(): Invalid argument terminate called after throwing an instance of 'std::system_error' what(): Invalid argument terminate called after throwing an instance of 'std::system_error' what(): Invalid argument terminate called after throwing an instance of 'std::system_error' what(): Invalid argument terminate called after throwing an instance of 'std::system_error' what(): Invalid argument
num_workers was 6 and when I checked process, there were 5 defunct childerens. Is stop_worker problem?? server is Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-36-generic x86_64)