facebookresearch / darkforestGo

DarkForest, the Facebook Go engine.
Other
2.1k stars 325 forks source link

Crash in training #21

Closed lpaatero closed 7 years ago

lpaatero commented 7 years ago

I attempted training (kgs data) with train.sh I installed most recent version of torch, and cuda 8.0. Training seem to end soon with error as below:

| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
/home/lauri/go/engines/darkf/torch/install/bin/luajit: ./train/rl_framework/infra/bundle.lua:187: invalid arguments: CudaTensor CudaLongTensor 
expected arguments: [*CudaByteTensor*] CudaTensor float | *CudaTensor* CudaTensor float | [*CudaByteTensor*] CudaTensor CudaTensor | *CudaTensor* CudaTensor CudaTensor
stack traceback:
        [C]: in function 'eq'
        ./train/rl_framework/infra/bundle.lua:187: in function 'get_top5'
        ./train/rl_framework/infra/bundle.lua:242: in function 'backward_prepare'
        ./train/rl_framework/infra/agent.lua:47: in function 'optimize'
        ./train/rl_framework/infra/engine.lua:114: in function 'train'
        ./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
        train.lua:155: in main chunk
        [C]: in function 'dofile'
        ...arkf/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x004063a0
yuandong-tian commented 7 years ago

Issue fixed. You can git pull and run.