Open CESARDELATORRE opened 4 years ago
While I am not too concerned with demo speed, we need to improve the docker-handling time and cut out the image creation for the common cases. See this papercut to address this https://msdata.visualstudio.com/Vienna/_workitems/edit/583388
File related papercut with the info:
https://msdata.visualstudio.com/Vienna/_workitems/edit/587850
This might be because the size of the training datasets is pretty small and then in remote training it might need to deploy Docker containers for the trainings whereas in local training is straightforward and it just trains in a ready-to-go machine/VM?
If the datasets were large, that time needed for Docker containers might be small in comparison to training times...
But this is a papercut for folks experimenting with small downsampled datasets where the end-to-end training in remote compute is too high due to infrastructure time needed (containers?):
Local Training: Total Time: 5.7 minutes versus Remote Training: Total Time: 67 minutes
Basically it is around 5 secs for each child run local training and 1.5 minutes for each remote training.
Local Training: Total Time: 5.7 minutes
Remote Training: Total Time: 67 minutes