Open garzy opened 4 years ago
Hi @garzy
Thanks for reporting this issue.
I have fixed the errors. Could you remove your local changes and update your local repository?
I've updated the file and launch it again, but ends crashing at line
torch.save(final_model_file_name, best_model)
Throwing the above exception
Maybe could be an error with return type of local best_model = torch.load(best_model_path)
??
Could you print the complete log output?
Selecting best model with less Validation Huber Loss ...
best epoch: 201
best loss: 0.076074071484905
best model path ../Data/Models/NoLimit/river//epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
/home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
/home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
Training/pickup_best_model.lua:83: in function 'select_best_model'
Training/pickup_best_model.lua:97: in main chunk
[C]: in function 'dofile'
/home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
Please, update again your local repo "deeper-stacker" in master branch and try again.
Thanks!
Same problem :(
best epoch: 201
best loss: 0.076074071484905
best model info path ../Data/Models/NoLimit/river/epoch_201_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
/home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
/home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
Training/pickup_best_model.lua:85: in function 'select_best_model'
Training/pickup_best_model.lua:104: in main chunk
[C]: in function 'dofile'
/home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
Could you do a ls -lah ../Data/Models/NoLimit/river/
to this path?
...
-rw-rw-r-- 1 kml kml 119 oct 7 01:36 epoch_86_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:36 epoch_86_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:37 epoch_87_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:37 epoch_87_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:38 epoch_88_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:38 epoch_88_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:38 epoch_89_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:38 epoch_89_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 00:47 epoch_8_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 00:47 epoch_8_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:39 epoch_90_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:39 epoch_90_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:40 epoch_91_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:40 epoch_91_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:40 epoch_92_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:40 epoch_92_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:41 epoch_93_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:41 epoch_93_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:42 epoch_94_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:42 epoch_94_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:42 epoch_95_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:42 epoch_95_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:43 epoch_96_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:43 epoch_96_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:43 epoch_97_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:43 epoch_97_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:44 epoch_98_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:44 epoch_98_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 01:45 epoch_99_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 01:45 epoch_99_gpu.model
-rw-rw-r-- 1 kml kml 119 oct 7 00:47 epoch_9_gpu.info
-rw-rw-r-- 1 kml kml 112M oct 7 00:47 epoch_9_gpu.model
-rw-rw-r-- 1 kml kml 0 oct 7 10:56 final__gpu.model
-rw-rw-r-- 1 kml kml 8 oct 6 19:47 .gitkeep
two underscores at final__gpu.model... maybe this?
I fixed two typos in master branch. Try again.
same problem
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
/home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
/home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
Training/pickup_best_model.lua:85: in function 'select_best_model'
Training/pickup_best_model.lua:104: in main chunk
[C]: in function 'dofile'
/home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
Don't worry, maybe have corrupted training models because I generated them with kubuntu 18.04 but at the end I'm having segmentation fault core exceptions and trying to fix the problem I've noticed that I need kubuntu 16 instead, but in fresh install of kubuntu 16 I've run directly the step 4.th Training/main_train.lua 4
I'm going to retry operations from step 3.th Training/raw_converter.lua 4
After repeat the steps I'm having the same error:
/deeper-stacker/Source$ th Training/pickup_best_model.lua 4
Selecting best model with less Validation Huber Loss ...
best epoch: 204 of total: 350 epochs
best loss: 0.074449650388494
best model info path ../Data/Models/NoLimit/river/epoch_204_gpu.info
saving final model
/home/torch/install/bin/lua: /home/torch/install/share/lua/5.2/torch/File.lua:136: attempt to call field 'insert' (a nil value)
stack traceback:
/home/torch/install/share/lua/5.2/torch/File.lua:136: in function 'writeObject'
/home/torch/install/share/lua/5.2/torch/File.lua:388: in function 'save'
Training/pickup_best_model.lua:85: in function 'select_best_model'
Training/pickup_best_model.lua:104: in main chunk
[C]: in function 'dofile'
/home/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?
I can continue doing this without launch pickup_best_model script
cp epoch_204_gpu.info final_gpu.info
cp epoth_204_gpu.model final_gpu.model
When I execute $ torch.load('final_cpu.info')
model seems to load well.
Then, I continue with turn generation:
kml@kubuntu:~/deeper-stacker$ cd Source && th DataGeneration/main_data_generation.lua 3
Generating data ...
6sAh9s5c 1 292NN information:
learning_rate 0.0001
valid_loss 0.074449650388494
gpu true
epoch 204
NN architecture:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> output]
(1): nn.ConcatTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output]
| (1): nn.Linear(1009 -> 500)
| (2): nn.BatchNormalization (2D) (500)
| (3): nn.PReLU
| (4): nn.Linear(500 -> 500)
| (5): nn.BatchNormalization (2D) (500)
| (6): nn.PReLU
| (7): nn.Linear(500 -> 500)
| (8): nn.BatchNormalization (2D) (500)
| (9): nn.PReLU
| (10): nn.Linear(500 -> 1008)
| }
`-> (2): nn.Sequential {
[input -> (1) -> output]
(1): nn.Narrow
}
... -> output
}
(2): nn.ConcatTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> output]
| (1): nn.SelectTable(1)
| }
`-> (2): nn.Sequential {
[input -> (1) -> (2) -> (3) -> output]
(1): nn.DotProduct
(2): nn.Replicate
(3): nn.MulConstant
}
... -> output
}
(3): nn.CAddTable
}
nextround init_bucket time: 1.1490240097046
avgTime: 123.4568271637
AdAs8s8h 2 979nextround init_bucket time: 0.58787417411804
avgTime: 73.452112078667
4hTdAd7h 3 1712nextround init_bucket time: 1.2796399593353
avgTime: 57.244448343913
Th2d3s5c 4 14861nextround init_bucket time: 0.56205201148987
avgTime: 44.798284769058
2s2hTsJc 5 100nextround init_bucket time: 0.64476418495178
This error is weird. When I run th Training/pickup_best_model.lua 4
directly, the error occurs. When I debug the pickup_best_model.lua
file in vs code, no error occurs and the final_gpu.model
works fine. When I enter the torch environment and execute torch.save(final_model_file_name, best_model)
manually, no error occurs. My environment is win10, luajit, cutorch.
I have to add this at beggining:
And
because game_settings.nl is NULL
And replace this
with this
because NIL exception too.
Finally, it's crashing at line:
error thrown:
I'm on kubuntu 16.04 with torch, lua 5.2, cutorch, cuda, and Nvidia GTX 1060 with 6GB of RAM