01BTC10 / pyrit

Automatically exported from code.google.com/p/pyrit
0 stars 0 forks source link

Performance gains drops off drastically when adding 4th and 5th AWS GPU cluster #451

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. pyrit benchmark on AWS GPU cluster with 1 node, then 2, then 3, 4, and 5
2. record the benchmarked PMKs after adding each node
3.

What is the expected output? What do you see instead?
I expect the gain in performance to be somewhat liner. 20k PMKs/s, 40k, 60k , 
80k, 100k, etc.  I do see this linear gain when adding node 2 and 3, but I 
don't see hardly any gain at all with the 4th node, and the 5th node actually 
actually drops the PMKs/s to about 50% of the performance of 3 nodes.

What version of the product are you using? On what operating system?

Pyrit 0.4.1-dev (svn r308)
G2 Instances are backed by 1 x NVIDIA GRID GPU (Kepler GK104) and 8 x hardware 
hyperthreads from an Intel Xeon E5-2670

Please provide any additional information below.

Original issue reported on code.google.com by phillip....@gmail.com on 6 Jun 2014 at 2:54

GoogleCodeExporter commented 9 years ago
with limit_ncpus = 0 
1 node = 22,848.78, 2 nodes = 41,042.71, 3 nodes = 62,087.50, 4 nodes = 
62,035.02, 5 nodes = 36,827.1

with limit_ncpus = 8
1 node = 22,725.95, 2 nodes = 41,088.34, 3 nodes = 63, 593.88, 4 nodes = 63, 
593.88, 5 nodes = 32,851.50

Original comment by phillip....@gmail.com on 6 Jun 2014 at 2:57

GoogleCodeExporter commented 9 years ago
After running yum update on all instances, I ran pyrit benchmark with all 5 
nodes, and it was 52,213...still not what I would expect from 5 nodes, but a 
lot better than the 30k range I was seeing.

Just finished a long benchmark with all 5 nodes, 61,258.  Again, better, but no 
where near what I would expect from 5 nodes.  About the same as 3 or 4.

Original comment by phillip....@gmail.com on 6 Jun 2014 at 3:46

GoogleCodeExporter commented 9 years ago
the limit_ncpu's option is deceptive, a core is set aside for each GPU on a 
machine 'serving' but the machine sending out the PMK's has an increased 
workload over the other nodes .... I cant really give you a ratio to work with, 
but on the cluster im currently using, the machine sending PMK's typically eats 
4/8 of my 4ghz CPU cores with the using limit_ncpu's=1, the core assigned to 
the GPU in that system .... the cluster of three pushes around 80k, you didn't 
mention what the TTS values were for the clusters, nowhere does it really 
describe what they mean, so it seems everyone ignores them ... in my 
experience, having a low value can cause issues ( < 1) can have serving 
machines crash out, and sometimes take the master down randomly also .... you 
need to locate a file called 'network.py' ( on Kali 1.0.8 its @ 
/usr/local/lib/python2.7/dist-packages/cpyrit/network.py ) and change the line 
" self.server.gather(self.client.uuid, 5000)" 5000 is the default, its how many 
keys it keeps buffered for work, so if a node gets 20,000, set it between 
20,000 and 60,000, nodes will not be able to fill their buffers until the 
master has. it can take a few minutes to stabilize, nodes need a stable and 
constant link to each other, so avoid wifi to avoid issues, receiving 50,000 
odd PMK's is 2MBit / second constant, if you have anything else that increases 
packet latency you might want to use bigger buffers .... when a node drops out, 
the keys already assigned to it need to be recycled into the queue, having a 
node drop out that had a large buffer will partially stall the master, the code 
for the benchmark is more of an indicator than a guide, in real world 
scenarios, averages 10% higher than the benchmark are common other factors can 
impact real world tests aswell, like latency involved in pulling huge amounts 
of passwords out of a mysql server that may or may not be on the same machine 
.... 

Original comment by shaneper...@gmail.com on 21 Aug 2014 at 7:05