cleinc / bts

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
GNU General Public License v3.0
635 stars 179 forks source link

Fix EOFError in Pytorch for multiprocessing_distributed #117

Closed reiniscimurs closed 3 years ago

reiniscimurs commented 3 years ago

When training on PyTorch with multiprocessing_distributed, "/tensorboardX/event_file_writer.py" raises an EOFError. As far as I understand, this happens because a thread is closed without closing the SummaryWriter for Tensorboard. So the SummaryWriter tries to access data that is not there anymore and receives the End of File Error. Closing "writer" and "eval_summary_writer" after finishing the training cycle and before closing the thread fixes the issue.