aws / sagemaker-pytorch-training-toolkit

Toolkit for running PyTorch training scripts on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
197 stars 87 forks source link

FastAI v1.0.59 causes failed training job #215

Closed ghost closed 4 years ago

ghost commented 4 years ago

Describe the bug The fastprogress bar using Fast AI doesn't work correctly with a PyTorch training job. I got an error TypeError: init() got an unexpected keyword argument 'auto_update', when launching a training job. The attached screenshot is what lead me to this error. The error is described here: https://forums.fast.ai/t/fastprogress-auto-update/58830

To reproduce Launch a training job using fast ai version 1.0.59 (used as default at present).

Expected behavior A successful training job.

Screenshots or logs image

System information A description of your system. Please provide:

Additional context This problem should simply require a version increment of fastai to resolve, they've already released the bugfix in 1.0.60.

nadiaya commented 4 years ago

This repo no longer manages docker images and external libraries installed in them. Please file the issue against https://github.com/aws/deep-learning-containers