Fixes several issues so that the training commands in README work:
fix os.execute -> os.system
fix hacks for data_parallel. They weren't working in data parallel because the old lambda function refers to tensors not transferred to proper GPU yet.
fix train/val/test directory checking in COCO image loading. Current code only checks the containing dir of these split sets so it errors in cases when downloading val after train.
fix training config yaml missing size for coco, and thus can't find the correct directory to load data.
Fixes several issues so that the training commands in README work: