Open camjac251 opened 4 years ago
It should be possible.
I'm getting this error when attempting to run the first training example python train.py --output_directory=outdir --log_directory=logdir
ModuleNotFoundError: No module named 'numpy.testing.decorators'
I've attempted first to install the latest pytorch through pip instead of conda as it didn't work before
pip install torch===1.4.0 torchvision===0.5.0 -f https://download.pytorch.org/whl/torch_stable.html
And then installed apex from the GitHub source directly. Then finally installed the requirements.txt over pip.
I'll try different numpy versions to see if it works.
This is the result of pip list
absl-py 0.9.0
apex 0.1
astor 0.8.1
audioread 2.1.8
certifi 2019.11.28
cycler 0.10.0
decorator 4.4.2
gast 0.2.2
google-pasta 0.2.0
grpcio 1.27.2
h5py 2.10.0
inflect 0.2.5
jamo 0.4.1
joblib 0.14.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
librosa 0.6.0
llvmlite 0.31.0
Markdown 3.2.1
matplotlib 2.1.0
music21 5.7.2
nltk 3.4.5
numba 0.48.0
numpy 1.18.2
opt-einsum 3.2.0
Pillow 7.0.0
pip 20.0.2
protobuf 3.11.3
pyparsing 2.4.6
python-dateutil 2.8.1
pytz 2019.3
resampy 0.2.2
scikit-learn 0.22.2.post1
scipy 1.0.0
setuptools 46.1.1.post20200323
six 1.14.0
tensorboard 1.15.0
tensorboardX 1.1
tensorflow 1.15.2
tensorflow-estimator 1.15.1
termcolor 1.1.0
torch 1.4.0
torchvision 0.5.0
Unidecode 1.0.22
Werkzeug 1.0.0
wheel 0.34.2
wincertstore 0.2
wrapt 1.12.1
I ended up downgrading numpy to 1.16.4 and it cleared up that issue however I am back to the first error I had when opening the issue. There's some issue with tensorboardX being an older version that might not work fully with the required 1.5.0 tensorboard by tensorflow.
tensorboard
and tensorboardx
with the versions on the last comment and it's now training. Below was the error I got using tensorboardx version 1.1 with tensorboard version 1.15.0Traceback (most recent call last):
File "train.py", line 17, in <module>
from hparams import create_hparams
File "C:\Users\camja\Desktop\mellotron\hparams.py", line 1, in <module>
import tensorflow as tf
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow\__init__.py", line 99, in <module>
from tensorflow_core import *
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\__init__.py", line 36, in <module>
from tensorflow._api.v1 import compat
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\_api\v1\compat\__init__.py", line 24, in <module>
from tensorflow._api.v1.compat import v2
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorflow_core\_api\v1\compat\v2\__init__.py", line 322, in <module>
from tensorboard.summary._tf import summary
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\summary\__init__.py", line 25, in <module>
from tensorboard.summary import v1
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\summary\v1.py", line 24, in <module>
from tensorboard.plugins.audio import summary as _audio_summary
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\plugins\audio\summary.py", line 36, in <module>
from tensorboard.plugins.audio import metadata
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\plugins\audio\metadata.py", line 21, in <module>
from tensorboard.compat.proto import summary_pb2
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\summary_pb2.py", line 16, in <module>
from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\tensor_pb2.py", line 16, in <module>
from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\resource_handle_pb2.py", line 16, in <module>
from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\tensorboard\compat\proto\tensor_shape_pb2.py", line 23, in <module>
serialized_pb=_b('\n+tensorboard/compat/proto/tensor_shape.proto\x12\x0btensorboard\"{\n\x10TensorShapeProto\x12.\n\x03\x64im\x18\x02 \x03(\x0b\x32!.tensorboard.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tBq\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01Z=github.com/tensorflow/tensorflow/tensorflow/go/core/framework\xf8\x01\x01\x62\x06proto3')
File "C:\Users\camja\anaconda3\envs\mello\lib\site-packages\google\protobuf\descriptor.py", line 884, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "tensorboard/compat/proto/tensor_shape.proto":
tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.dim" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto.unknown_rank: "tensorboard.TensorShapeProto.unknown_rank" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto.Dim.size: "tensorboard.TensorShapeProto.Dim.size" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto.Dim.name: "tensorboard.TensorShapeProto.Dim.name" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto.Dim: "tensorboard.TensorShapeProto.Dim" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto: "tensorboard.TensorShapeProto" is already defined in file "tensorboardX/src/tensor_shape.proto".
tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.Dim" seems to be defined in "tensorboardX/src/tensor_shape.proto", which is not imported by "tensorboard/compat/proto/tensor_shape.proto". To use it here, please add the necessary import.
I confirm that this repo works 100% with Windows 10 on GPU as I have used it to train on a custom dataset and produce results (sample 1, sample 2), I haven't used it in a while but I do remember the installation wasn't terribly hard at all, only trouble was making Apex work with CUDA, unfortunately I don't remember anymore how I did it as I moved on new projects.
Those samples are very frightening. Was that with waveglow? Did you use any custom parameters? I can't imagine a 500k iteration model. What hardware and iteration times were you seeing? I'm on a P5000 and using default settings but would see 8-10 seconds per iteration, which would take me 53 days nonstop training to go from 0 to 500,000.
Has anyone been able to get this working with Anaconda on Windows? I've run into many issues with attempting to install it and apex with pytorch 1.3.1, cuda 10.1 from the conda repo. I'll link some logs later