Open 0000sir opened 8 years ago
Time decreased to about 20 seconds after Torch reinstalled. Maybe I missed something during last installation. But the test.lua use one CPU only, how to use all of my CPUs ?
Torch should use OpenMP for parallel computations, the fact that only one core was used tells something went wrong with installation. For me it uses all the cores. Try manually specifying OMP_NUM_THREADS
environment variable.
Thank you @DmitryUlyanov , but have no luck with OMP_NUM_THREADS=64 And torch can get number of threads as 64:
th> print(torch.getnumthreads())
64
but it still runs on one cpu core, any advise will be thankful
if I run code below, it will use all of 64 cores
require 'torch'
local a = torch.FloatTensor(1000,1000)
local b = torch.FloatTensor(1000,1000)
for i=1,1000 do
local c = torch.mm(a,b)
end
This is strange. Convolution implementation uses matrix mul, so neural nets should be parallel as well..
It's strange that all of my CPUs working with neural-style, but not for texture_nets. Still don't know why.
Hm, it can be because of using threads for dataloader. I have no idea how to deal with it.
Try reinstalling numpy using the code from github. Also check which BLAS library you have installed and report you findings here.
I have a problem with CPU load too. However I've found it goes up when the batch size is increased. By default, the batch size is 4 and the CPU load peaks at ~400% but with 12 %CPU peaks at 1200%). Unfortunately this pushes the memory up as well. Therefore even if you can afford to increase the batch size doing so is generally not very good for getting the best CPU performance but it's probably better than not using the idle cores.
@0000sir By the way, this behaviour explains the performance you observed.
I've tested these on a computer with AMD FX(tm)-8350 Eight-Core Processor, with 32G RAM installed, with test.lua, I can get a stylized.jpg in 6 seconds. After reading this post https://github.com/DmitryUlyanov/texture_nets/issues/41 I think it's possible to generate high resolution images if I have larger RAM. I got a VM running on XenServer, with 32 CPU core(Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz) and 64G RAM, but when I ran test.lua with regular params, it was extreamly slow than the former computer. It costs over 15 minutes to generate one single image. What's wrong with that? I noticed the script runs on only single one CPU core, is that normal?
This is what I used: th test.lua -input_image images/forbidden_city.jpg -model_t7 data/checkpoints/model.t7 -cpu
Any body have experience on this?
Thanks.