Open VocAddict opened 4 years ago
As with #19, I've uncommented the offending lines to restart testing of the training/sampling files on fresh systems. I'll be testing the project from the bottom up on both windows and Linux to either iron out the issue with the current Python implementation, or doing a complete switch to Anaconda.
With #22 merged, it updates PyTorch to 1.5.1, which fixed the issue I was having when I tried to run the training script recently when I updated the submodules. I haven't gotten to test it on a fresh build yet but it did install and work from main.py so I'm hoping it does.
With #28 merged, it updates PyTorch to 1.7.1. I'll have to reinstall the Sandbox feature on Windows to give it a test but I'm not expecting much. Eventually, I'll start up the spinning rust and get it tested on Linux as well.
Training on Mac using default parameters
2021-04-19 10:06:32,145 - train - INFO - Creating model
longest token 1
0-Embedding
1-GRIDGRU
2-GRIDGRU
3-Linear
[Embedding(43, 128), GRIDGRU(), GRIDGRU(), Linear(in_features=128, out_features=43, bias=True)]
2021-04-19 10:06:32,175 - train - INFO - Created model with 405803 parameters
2021-04-19 10:06:32,199 - train - INFO - Loading data
2021-04-19 10:06:32,250 - dataloader - INFO - Loaded 229 items from test
2021-04-19 10:06:32,252 - dataloader - INFO - Loaded 229 items from val
2021-04-19 10:06:32,253 - dataloader - INFO - Loaded 1838 items from train
2021-04-19 10:06:32,277 - dataloader - INFO - No zeroes found in data, assuming one-based indexes
2021-04-19 10:06:32,289 - dataloader - INFO - 28 sequences, 0 batches for split train with offset 0
Traceback (most recent call last):
File "/Users/(username)/Projects/phonetic-songs/pytorch-rnn/train.py", line 126, in <module>
print('average loss: %.4f' % (totalloss.item()/traindata.batch_count))
AttributeError: 'int' object has no attribute 'item'
Training complete.
Same error on Linux and Windows as well.
Bumped pytorch-rnn to 7d6bff4 and dropped pytorch to v1.6.0 since that should be the latest version used for that tree. It's functional on my system, time to see if it works everywhere else.
Anything post 7d6dff4 for pytorch-rnn gives the following error.
Traceback (most recent call last):
File "pytorch-rnn/train.py", line 4, in <module>
from extensions import ZMDropout
File "C:\Users\Nkosi\Documents\GitHub\phonetic-songs\pytorch-rnn\extensions.py", line 2, in <module>
import ptrnn_cpp
ModuleNotFoundError: No module named 'ptrnn_cpp'
Training complete.
I'll have to look more in-depth into that since looking at the commits made, the module does exist so it may have to do with how I'm importing it to be used in this project.
The ptrnn_cpp branch under my fork deals with the lack of module error with pytorch-rnn to current. I'll merge into master when the current issue is resolved.
Note to self: Install conda in a couple days as well as Windows Sandbox and do some package comparison
Python 3.7.x doesn't seem to install torch 1.1.0 correctly in that scripts that require it don't work. Running these same scripts under Anaconda however gave the results I expected. Decision to be made whether to move the project to Anaconda or find a different program to train with.
This thread is to keep track of developments on that front.
When I made the scripts for trainer.py and sampler.py, I installed PyTorch to make use of the repository I found and I thought, cool it works, let me push it to the master branch. Only to realise that no one can get torch to install and work as it should. I'm not sure what special sauce my computer may have but everyone else doesn't seem to have it.
So I fired up a virtual container and tried it there.. and it did not work. Tried numerous methods of install torch and torchvision to the same error. Made use of a complete virtual system just to make sure to the same results. It would appear that Python 3.7.x and torch==1.1.0 have some issues I can't figure out.
So I uninstalled Python and installed Anaconda (miniconda to be exact) and used that as the python distribution and ran the scripts. All scripts ran as it should in the same way I expected so it would appear that conda is the way to go.
All submissions relating to training to the master branch concerning this would be placed on hold, and instead would be done on a new trainer branch until that is sorted out. #17 when merged will remove trainer and sampler from the main script, but leave them in the repository if anyone would like to try their hand at it and see if it works.