Closed OElesin closed 5 years ago
This seems to be an issue with the shared memory.
Try setting num_workers=0
for your data loader ?
Works when I did set num_workers=0
. However, training seemed a bit slow. Is there a what to improve this even with the shared memory constraint?
Thanks
Try setting thread_pool=True
and num_workers > 0, see how faster you get.
@OElesin any update on this, how did it go? Please reopen if you need further assistance
Sorry for late update. thread_pool
keyword was not available in the version of MxNet being used. Which is kind surprising.
I was able to follow the tutorial and reproduce the model and results for my use case. However, when I schedule the model training on AWS Batch, EC2 instance m4.4xlarge, it fails with the error below while extracting features. See line
Error message
I have tried to figure this out but nothing so far.
Please help if you have any ideas.