Faster on 8*4GHz / 32G Mem than 32*2GHz / 64G Mem ?

DmitryUlyanov / texture_nets

Code for "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images" paper.

Apache License 2.0

1.22k stars 217 forks source link

Faster on 84GHz / 32G Mem than 322GHz / 64G Mem ? #54

Open 0000sir opened 8 years ago

0000sir commented 8 years ago

I've tested these on a computer with AMD FX(tm)-8350 Eight-Core Processor, with 32G RAM installed, with test.lua, I can get a stylized.jpg in 6 seconds. After reading this post https://github.com/DmitryUlyanov/texture_nets/issues/41 I think it's possible to generate high resolution images if I have larger RAM. I got a VM running on XenServer, with 32 CPU core(Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz) and 64G RAM, but when I ran test.lua with regular params, it was extreamly slow than the former computer. It costs over 15 minutes to generate one single image. What's wrong with that? I noticed the script runs on only single one CPU core, is that normal?

This is what I used: th test.lua -input_image images/forbidden_city.jpg -model_t7 data/checkpoints/model.t7 -cpu

Any body have experience on this?

Thanks.

0000sir commented 8 years ago

Time decreased to about 20 seconds after Torch reinstalled. Maybe I missed something during last installation. But the test.lua use one CPU only, how to use all of my CPUs ?

DmitryUlyanov commented 8 years ago

Torch should use OpenMP for parallel computations, the fact that only one core was used tells something went wrong with installation. For me it uses all the cores. Try manually specifying OMP_NUM_THREADS environment variable.

0000sir commented 8 years ago

Thank you @DmitryUlyanov , but have no luck with OMP_NUM_THREADS=64 And torch can get number of threads as 64:

th> print(torch.getnumthreads())
64

but it still runs on one cpu core, any advise will be thankful

0000sir commented 8 years ago

if I run code below, it will use all of 64 cores

require 'torch'

local a = torch.FloatTensor(1000,1000)
local b = torch.FloatTensor(1000,1000)

for i=1,1000 do
  local c = torch.mm(a,b)
end

DmitryUlyanov commented 8 years ago

This is strange. Convolution implementation uses matrix mul, so neural nets should be parallel as well..

0000sir commented 8 years ago

It's strange that all of my CPUs working with neural-style, but not for texture_nets. Still don't know why.

DmitryUlyanov commented 8 years ago

Hm, it can be because of using threads for dataloader. I have no idea how to deal with it.

tisawe commented 8 years ago

Try reinstalling numpy using the code from github. Also check which BLAS library you have installed and report you findings here.

ink1 commented 8 years ago

I have a problem with CPU load too. However I've found it goes up when the batch size is increased. By default, the batch size is 4 and the CPU load peaks at ~400% but with 12 %CPU peaks at 1200%). Unfortunately this pushes the memory up as well. Therefore even if you can afford to increase the batch size doing so is generally not very good for getting the best CPU performance but it's probably better than not using the idle cores.

ink1 commented 8 years ago

@0000sir By the way, this behaviour explains the performance you observed.

DmitryUlyanov / texture_nets

Faster on 8*4GHz / 32G Mem than 32*2GHz / 64G Mem ? #54

Faster on 84GHz / 32G Mem than 322GHz / 64G Mem ? #54