camjac251 commented 4 years ago

Has anyone been able to get this working with Anaconda on Windows? I've run into many issues with attempting to install it and apex with pytorch 1.3.1, cuda 10.1 from the conda repo. I'll link some logs later

rafaelvalle commented 4 years ago

It should be possible.

camjac251 commented 4 years ago

I'm getting this error when attempting to run the first training example python train.py --output_directory=outdir --log_directory=logdir ModuleNotFoundError: No module named 'numpy.testing.decorators'

I've attempted first to install the latest pytorch through pip instead of conda as it didn't work before pip install torch===1.4.0 torchvision===0.5.0 -f https://download.pytorch.org/whl/torch_stable.html

And then installed apex from the GitHub source directly. Then finally installed the requirements.txt over pip.

I'll try different numpy versions to see if it works.

This is the result of pip list

absl-py              0.9.0
apex                 0.1
astor                0.8.1
audioread            2.1.8
certifi              2019.11.28
cycler               0.10.0
decorator            4.4.2
gast                 0.2.2
google-pasta         0.2.0
grpcio               1.27.2
h5py                 2.10.0
inflect              0.2.5
jamo                 0.4.1
joblib               0.14.1
Keras-Applications   1.0.8
Keras-Preprocessing  1.1.0
librosa              0.6.0
llvmlite             0.31.0
Markdown             3.2.1
matplotlib           2.1.0
music21              5.7.2
nltk                 3.4.5
numba                0.48.0
numpy                1.18.2
opt-einsum           3.2.0
Pillow               7.0.0
pip                  20.0.2
protobuf             3.11.3
pyparsing            2.4.6
python-dateutil      2.8.1
pytz                 2019.3
resampy              0.2.2
scikit-learn         0.22.2.post1
scipy                1.0.0
setuptools           46.1.1.post20200323
six                  1.14.0
tensorboard          1.15.0
tensorboardX         1.1
tensorflow           1.15.2
tensorflow-estimator 1.15.1
termcolor            1.1.0
torch                1.4.0
torchvision          0.5.0
Unidecode            1.0.22
Werkzeug             1.0.0
wheel                0.34.2
wincertstore         0.2
wrapt                1.12.1

camjac251 commented 4 years ago

I ended up downgrading numpy to 1.16.4 and it cleared up that issue however I am back to the first error I had when opening the issue. There's some issue with tensorboardX being an older version that might not work fully with the required 1.5.0 tensorboard by tensorflow.

9 Helped me out however. I replaced both `tensorboard` and `tensorboardx` with the versions on the last comment and it's now training. Below was the error I got using tensorboardx version 1.1 with tensorboard version 1.15.0

Traceback (most recent call last):
  File "train.py", line 17, in <module>
    from hparams import create_hparams
  File "C:\Users\camja\Desktop\mellotron\hparams.py", line 1, in <module>
    import tensorflow as tf
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow\__init__.py", line 99, in <module>
    from tensorflow_core import *
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\__init__.py", line 36, in <module>
    from tensorflow._api.v1 import compat
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\_api\v1\compat\__init__.py", line 24, in <module>
    from tensorflow._api.v1.compat import v2
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\_api\v1\compat\v2\__init__.py", line 322, in <module>
    from tensorboard.summary._tf import summary
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\summary\__init__.py", line 25, in <module>
    from tensorboard.summary import v1
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\summary\v1.py", line 24, in <module>
    from tensorboard.plugins.audio import summary as _audio_summary
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\plugins\audio\summary.py", line 36, in <module>
    from tensorboard.plugins.audio import metadata
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\plugins\audio\metadata.py", line 21, in <module>
    from tensorboard.compat.proto import summary_pb2
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\summary_pb2.py", line 16, in <module>
    from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\tensor_pb2.py", line 16, in <module>
    from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\resource_handle_pb2.py", line 16, in <module>
    from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\tensor_shape_pb2.py", line 23, in <module>
    serialized_pb=_b('\n+tensorboard/compat/proto/tensor_shape.proto\x12\x0btensorboard\"{\n\x10TensorShapeProto\x12.\n\x03\x64im\x18\x02 \x03(\x0b\x32!.tensorboard.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tBq\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01Z=github.com/tensorflow/tensorflow/tensorflow/go/core/framework\xf8\x01\x01\x62\x06proto3')
  File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\google\protobuf\descriptor.py", line 884, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "tensorboard/compat/proto/tensor_shape.proto":
  tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.dim" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto.unknown_rank: "tensorboard.TensorShapeProto.unknown_rank" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto.Dim.size: "tensorboard.TensorShapeProto.Dim.size" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto.Dim.name: "tensorboard.TensorShapeProto.Dim.name" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto.Dim: "tensorboard.TensorShapeProto.Dim" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto: "tensorboard.TensorShapeProto" is already defined in file "tensorboardX/src/tensor_shape.proto".
  tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.Dim" seems to be defined in "tensorboardX/src/tensor_shape.proto", which is not imported by "tensorboard/compat/proto/tensor_shape.proto".  To use it here, please add the necessary import.

AndroYD84 commented 4 years ago

I confirm that this repo works 100% with Windows 10 on GPU as I have used it to train on a custom dataset and produce results (sample 1, sample 2), I haven't used it in a while but I do remember the installation wasn't terribly hard at all, only trouble was making Apex work with CUDA, unfortunately I don't remember anymore how I did it as I moved on new projects.

camjac251 commented 4 years ago

Those samples are very frightening. Was that with waveglow? Did you use any custom parameters? I can't imagine a 500k iteration model. What hardware and iteration times were you seeing? I'm on a P5000 and using default settings but would see 8-10 seconds per iteration, which would take me 53 days nonstop training to go from 0 to 500,000.

NVIDIA / mellotron

Windows Anaconda #49

9 Helped me out however. I replaced both `tensorboard` and `tensorboardx` with the versions on the last comment and it's now training. Below was the error I got using tensorboardx version 1.1 with tensorboard version 1.15.0

NVIDIA / mellotron

Windows Anaconda #49

9 Helped me out however. I replaced both tensorboard and tensorboardx with the versions on the last comment and it's now training. Below was the error I got using tensorboardx version 1.1 with tensorboard version 1.15.0

9 Helped me out however. I replaced both `tensorboard` and `tensorboardx` with the versions on the last comment and it's now training. Below was the error I got using tensorboardx version 1.1 with tensorboard version 1.15.0