When training on PyTorch with multiprocessing_distributed, "/tensorboardX/event_file_writer.py" raises an EOFError.
As far as I understand, this happens because a thread is closed without closing the SummaryWriter for Tensorboard. So the SummaryWriter tries to access data that is not there anymore and receives the End of File Error. Closing "writer" and "eval_summary_writer" after finishing the training cycle and before closing the thread fixes the issue.
When training on PyTorch with multiprocessing_distributed, "/tensorboardX/event_file_writer.py" raises an EOFError. As far as I understand, this happens because a thread is closed without closing the SummaryWriter for Tensorboard. So the SummaryWriter tries to access data that is not there anymore and receives the End of File Error. Closing "writer" and "eval_summary_writer" after finishing the training cycle and before closing the thread fixes the issue.