[FastPitch/PyTorch] train.py fails with excpetion (resaon: old DLLogger version in container)

Related to FastPitch/PyTorch

Describe the bug When running train.py via the README recipe, the code fails with exception:

Traceback (most recent call last):
    File "train.py", line 559, in <module>
      main()
    File "train.py", line 306, in main
      logger.init(log_fpath, args.output, enabled=(args.local_rank == 0),
    File "/workspace/fastpitch/FastPitch/common/tb_dllogger.py", line 90, in init
      JSONStreamBackend(Verbosity.DEFAULT, log_fpath, append=True),
TypeError: __init__() got an unexpected keyword argument 'append'

The reason (+solution) The container is installed with DLLogger version 0.1, and the JSONStreamBackend constructor doesn't have the "append" input argument. I had to upgrade DLLogger to latest version (1.0) from the git repo:

pip uninstall DLLogger
pip install git+https://github.com/NVIDIA/dllogger#egg=dllogger

After this upgrade of DLLogger, it works ok.

To Reproduce Steps to reproduce the behavior:

Follow the instructions of the README.md file under "FastPitch#quick-start-guide"
run bash scripts/train.sh

Expected behavior The training process should start running.

Environment

Container version: nvcr.io/nvidia/pytorch:21.05-py3
GPUs in the system: 8X A40 40GB
CUDA driver version Driver Version: 520.61.05 CUDA Version: 11.8

NVIDIA / DeepLearningExamples

[FastPitch/PyTorch] train.py fails with excpetion (resaon: old DLLogger version in container) #1231