artiya4u / pyrit

Automatically exported from code.google.com/p/pyrit
0 stars 0 forks source link

pyrit-calpp v2 - testing required #148

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm attaching new version of CAL++ computing core. It's about 8-9% faster
than the one in svn. 
I have to admit that this is a little strange solution and I'm not sure how
it will behave on other ATI cards. So it needs a little bit of testing :).

Standard pyrit core works in the following way
- prepares data 
- transfers to gpu, 
- gpu runs 
- transfer data from gpu
- postprocess data. 
Simple timing analysis shows that this approach shouldn't waste more than
1-2% of time ( when gpu is not working ). 

The new calpp v2 core is trying to mask transfer/data processing by gpu
computations ( so we start computation and then when gpu is busy we process
data on cpu ). It should be only 1-2% faster. But it isn't :).

First of all the simple analysis doesn't take into account driver/card
behavior. ATI's driver are really sensitive to cpu performance and probably
are hiding some actions.

Now to the interesting part :). The v2 core doesn't do explicit data
transfer at all. Trying to send data to gpu while it was working resulted
in some performance degradation. So now all the data are in host memory and
gpu is taking it directly during computations. Fortunately in the case of
pyrit amount of computations is so huge that transfer from host memory can
be masked by computations from other gpu threads.
This may not be true for 5xxx cards where kernel is smaller and faster.
Also memory<->gpu transfer speed may have some impact.

So please test the v2 core and post here your results.
Also you should try to change line 453 in pyrit/cpyrit/cpyrit.py from
'ncpus-=1' to 'ncpus-=2' ( or 'ncpus-=4' ) and test impact of more
available cpu cores on performance.

The v2 core works with svn version of CAL++ library (it has been attached)

Original issue reported on code.google.com by hazema...@gmail.com on 16 Apr 2010 at 3:44

Attachments:

GoogleCodeExporter commented 9 years ago
You are GOD
pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (110381.9 PMKs/s)... \ 

Computed 118125.96 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 54550.9 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 53870.6 PMKs/s (RTT 1.4)
#3: 'CPU-Core (SSE2)': 756.8 PMKs/s (RTT 2.7)
#4: 'CPU-Core (SSE2)': 698.6 PMKs/s (RTT 2.7)
#5: 'CPU-Core (SSE2)': 647.2 PMKs/s (RTT 2.6)
#6: 'CPU-Core (SSE2)': 776.1 PMKs/s (RTT 2.6)

Original comment by odl...@gmail.com on 17 Apr 2010 at 8:33

GoogleCodeExporter commented 9 years ago
With BUFFER_SIZE set to 4 and ncpus set to 4
pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (115617.4 PMKs/s)... \ 

Computed 120388.59 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 57400.8 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 56835.5 PMKs/s (RTT 1.2)

Original comment by odl...@gmail.com on 17 Apr 2010 at 8:37

GoogleCodeExporter commented 9 years ago
Odlan3 when you have some spare time, please test with __CAL_NON_BLOCKING 
enabled,
disabled and without cpu cores. But it's no hurry - just for my curiosity :).

Original comment by hazema...@gmail.com on 17 Apr 2010 at 8:39

GoogleCodeExporter commented 9 years ago
Odlan3 - With ncpus-=4 and BUFFER_SIZE set to 2 ( and 3 ) what are the results ?

Original comment by hazema...@gmail.com on 17 Apr 2010 at 8:40

GoogleCodeExporter commented 9 years ago
wow, now I am back to earth from heaven!
Ok, related to my CPU 4 core, what about my 2 cores of CPU that are not serving 
the
GPUs? Are they Unused? To me it is ok, but to have back those 1200 PMK will 
appreciable.
I see that odlan3 have more porformances to run only GPU... it means that to 
disable
unused CPU-cores is better?
HAve I to work on that 453 line?

Original comment by pyrit.lo...@gmail.com on 17 Apr 2010 at 8:46

GoogleCodeExporter commented 9 years ago
They aren't really unused - ATI's driver is using them to feed gpu.
Probably there should be option added to pyrit so user can decide how many 
cores to
submit to feeding gpus. 
Yep you should try what are the best settings for you.

Original comment by hazema...@gmail.com on 17 Apr 2010 at 8:51

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The benchmark in comment 51 and 52 is whit
#define __CAL_USE_NON_BLOCKING_WAIT 1
Now I try recompiling whit BUFFER_SIZE set to 2 and 3 and ncpus-=4

Original comment by odl...@gmail.com on 17 Apr 2010 at 8:54

GoogleCodeExporter commented 9 years ago
Hazema11: Do you mean that 2 CPU cores are used to feed one GPU? 

Original comment by pyrit.lo...@gmail.com on 17 Apr 2010 at 8:55

GoogleCodeExporter commented 9 years ago
ATI's driver is multithreaded. So sometimes it uses everything thats available 
( it
can be even more than 2 )

Original comment by hazema...@gmail.com on 17 Apr 2010 at 8:56

GoogleCodeExporter commented 9 years ago
Usually one core per gpu is good choice. But with yours gpus which are 
blazingly fast
it looks like isn't the case :).

Original comment by hazema...@gmail.com on 17 Apr 2010 at 8:58

GoogleCodeExporter commented 9 years ago
This whit 
//#define __CAL_USE_NON_BLOCKING_WAIT 1
BUFFER_SIZE 4
NCPUS-=4

odlan@H3ll:~$ pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (114847.6 PMKs/s)... \ 

Computed 119536.78 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 57603.8 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 57322.9 PMKs/s (RTT 1.2)

This whit 
//#define __CAL_USE_NON_BLOCKING_WAIT 1
BUFFER_SIZE 2
NCPUS-=2

pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (117966.4 PMKs/s)... / 

Computed 121279.06 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 56785.2 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 57078.9 PMKs/s (RTT 1.3)
#3: 'CPU-Core (SSE2)': 503.1 PMKs/s (RTT 3.3)
#4: 'CPU-Core (SSE2)': 524.8 PMKs/s (RTT 3.1)
#5: 'CPU-Core (SSE2)': 725.2 PMKs/s (RTT 2.5)
#6: 'CPU-Core (SSE2)': 755.6 PMKs/s (RTT 2.7)

This whit
//#define __CAL_USE_NON_BLOCKING_WAIT 1
BUFFER_SIZE 2
NCPUS-=4

odlan@H3ll:~$ pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (116946.3 PMKs/s)... - 

Computed 119214.89 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 58207.8 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 57992.2 PMKs/s (RTT 1.3)

This whit 
#define __CAL_USE_NON_BLOCKING_WAIT 1
BUFFER_SIZE 2
NCPUS-=4

odlan@H3ll:~$ pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (115279.6 PMKs/s)... / 

Computed 119260.91 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 57694.5 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 58352.9 PMKs/s (RTT 1.2)

Original comment by odl...@gmail.com on 17 Apr 2010 at 9:18

GoogleCodeExporter commented 9 years ago
I just finish to run 40 minutes of passthrough, it is exactly 101513 PMK/s.
thanks to calpp and v2b-1 in one week my system got +55% of performances!
Thanks to all you guys, lukas and hazeman11, for this wonderfull software.
I hope you will not stop to optimize the coode squeeze our GPU till last bit of 
power :)

By the way, It is possible to have the list of parameter/value we can modify to 
try
to get more PMK?

you talked about 'ncpus' and 'BUFFER_SIZE', there are some parameter to work on?
It will be nice to have the parameters and the range of values, I would be glad 
to do
some test and report.

Original comment by pyrit.lo...@gmail.com on 17 Apr 2010 at 9:18

GoogleCodeExporter commented 9 years ago
BUFFER_SIZE should be as low as possible. If someone will not prove that value 
3 or 4
is better than 2 then it will be 2 ( pinned memory is limited resource ). 
You can change ncpus. But this value depends much on system/gpu. There is no 
one good
value for all. I belive it will have to be an option in pyrit and everyone will
choose best value for their system.

So you can test if BUFFER_SIZE should be changed :).

Original comment by hazema...@gmail.com on 17 Apr 2010 at 11:21

GoogleCodeExporter commented 9 years ago
To change BUFFER_SIZE correctly,I need to delete every refernce to pyrit and 
cpyrit
and recompile everything? Or just navigate to the folder cpyrit_calpp delete 
folder
build and 

python setup.py build 
sudo python setup.py install

Original comment by odl...@gmail.com on 18 Apr 2010 at 6:42

GoogleCodeExporter commented 9 years ago
Doing
python setup.py build 
sudo python setup.py install
is enough :).

Original comment by hazema...@gmail.com on 18 Apr 2010 at 9:00

GoogleCodeExporter commented 9 years ago
We still need more testing. Please someone with card 3xxx or 4xxx test v2b-1 
core (
comment 45 )

Original comment by hazema...@gmail.com on 20 Apr 2010 at 6:01

GoogleCodeExporter commented 9 years ago
System is Ubuntu 9.10 - fglrx 10.3 - AMD X4 965 - HD 4850

I'm not sure about what to install: Now I have calpp-svn.tar.gz  from 
Comment#1, then
pyrit-calpp-v2b.tar.gz from Comment #26 and cpyrit_calpp-v2b-1.tar.gz from 
Comment
#45. Cleaned up before installing.

All together with libboost-1.38 and fglrx 10.3.

pyrit benchmark Cal++ 2b-1 SVN fglrx 10.3 libboost-1.38 ncpus=1
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+
Running benchmark (18550.8 PMKs/s)... | 
Computed 19295.66 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 16734.3 PMKs/s (RTT 2.6)
#2: 'CPU-Core (SSE2)': 777.4 PMKs/s (RTT 2.8)
#3: 'CPU-Core (SSE2)': 762.3 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 782.3 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

should I change something ?

greetings, dapi

Original comment by datapirates@googlemail.com on 20 Apr 2010 at 9:00

GoogleCodeExporter commented 9 years ago
Install procedure 
1. remove anything with pyrit in name from directories /usr/local/bin/*pyrit*
/usr/local/lib/python2.6/dist-packages
2. Install calpp-svn.tar.gz 
3. Build & Install pyrit-calpp-v2.tar.gz  
4. Build & Install cpyrit_calpp-v2-1.tar.gz
Datapirates: Your result is a litte strange ( rtt 2.6 ) - like it's some other 
version. 

Original comment by hazema...@gmail.com on 20 Apr 2010 at 9:20

GoogleCodeExporter commented 9 years ago
Ok I made mistake :)
Correct install procedure is
1. remove anything with pyrit in name from directories /usr/local/bin/*pyrit*
/usr/local/lib/python2.6/dist-packages
2. Install calpp-svn.tar.gz 
3. Build & Install pyrit-calpp-v2b.tar.gz  ( comment 26 )
4. Build & Install cpyrit_calpp-v2-1.tar.gz ( comment 45 )

Original comment by hazema...@gmail.com on 20 Apr 2010 at 9:26

GoogleCodeExporter commented 9 years ago
Datapirates could you post here output of debug version ( as attachment to 
comment -
it will be quite long ). 

pyrit benchmark > output-debug.txt

Original comment by hazema...@gmail.com on 20 Apr 2010 at 11:10

Attachments:

GoogleCodeExporter commented 9 years ago
Small update to debug version ( printing more info )

Original comment by hazema...@gmail.com on 20 Apr 2010 at 11:36

Attachments:

GoogleCodeExporter commented 9 years ago
I'm attaching version with finer thread control. 

Original comment by hazema...@gmail.com on 21 Apr 2010 at 1:01

Attachments:

GoogleCodeExporter commented 9 years ago
Hi hazeman,

first I cleaned up all again and did a fresh install, including Cal++SVN.

pyrit benchmark Cal++ 2b-1 SVN fglrx 10.3 libboost-1.38 ncpus=1 cleaned
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+
Running benchmark (18252.0 PMKs/s)... \ 
Computed 19187.30 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 16704.0 PMKs/s (RTT 2.7)
#2: 'CPU-Core (SSE2)': 784.5 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 779.9 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 787.2 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

next I tried the debug from comment#72 without cleaning.

debug_comment72.txt

and last I installed cpyrit_calpp-v2b-2.tar.gz  from #73 also without cleaning.

pyrit benchmark Cal++ 2b-2 Comm#73 fglrx 10.3 libboost-1.38 ncpus=1 noClean
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+
Running benchmark (18473.2 PMKs/s)... \ 
Computed 19627.85 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 16548.6 PMKs/s (RTT 2.6)
#2: 'CPU-Core (SSE2)': 789.6 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 773.0 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 782.6 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

greetings, dapi

Original comment by datapirates@googlemail.com on 21 Apr 2010 at 12:19

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Datapirates: could you post debug output with BLOCK_SIZE 1 and BLOCK_SIZE 2. ( 
file
_cpyrit_calpp.cpp line 37 ). 

Original comment by hazema...@gmail.com on 21 Apr 2010 at 1:49

GoogleCodeExporter commented 9 years ago
Datapirate: you can change , recompile, install without cleaning

Original comment by hazema...@gmail.com on 21 Apr 2010 at 1:50

GoogleCodeExporter commented 9 years ago
Datapirate: Could you test next version :) ?

Original comment by hazema...@gmail.com on 21 Apr 2010 at 3:33

Attachments:

GoogleCodeExporter commented 9 years ago
Hi hazeman,

hope you mean BUFFER_SIZE 1 or BUFFER_SIZE 2    ....  not BLOCK_SIZE ?

two things, maybe unimportant:

 before I'm recompiling in an unpacked directory I'll do a './setup.py clean' This
deletes the build/temp.linux-x86_64-2.6 but not 'build/lib.linux-x86_64-2.6' Is 
this
expected behaviour?

 when I'm deleting 'build' I can see complete output. There are additional warnings,
not shining up in rev250. (void copy_gpu_inbuffer  and  void copy_gpu_outbuffer)

running build
running build_ext
Building modules...
building 'cpyrit._cpyrit_calpp' extension
creating build/temp.linux-x86_64-2.6
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes
-fPIC -I/home/cal/atistream2/include -I/usr/include/python2.6 -c 
_cpyrit_calpp.cpp -o
build/temp.linux-x86_64-2.6/_cpyrit_calpp.o -DVERSION="0.3.1-dev"
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for 
Ada/C/ObjC
but not for C++
_cpyrit_calpp.cpp: In function ‘PyObject* cpyrit_receive(CALDevice*, 
PyObject*)’:
_cpyrit_calpp.cpp:502: warning: comparison between signed and unsigned integer
expressions
_cpyrit_calpp.cpp: At global scope:
_cpyrit_calpp.cpp:297: warning: ‘void copy_gpu_inbuffer(CALDevice*, const
gpu_inbuffer*, boost::array<cal::Image2D, 5ul>&, int)’ defined but not used
_cpyrit_calpp.cpp:309: warning: ‘void copy_gpu_outbuffer(CALDevice*, 
gpu_outbuffer*,
boost::array<cal::Image2D, 2ul>&, int)’ defined but not used
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes
-fPIC -I/home/cal/atistream2/include -I/usr/include/python2.6 -c
_cpyrit_calpp_kernel.cpp -o build/temp.linux-x86_64-2.6/_cpyrit_calpp_kernel.o
-DVERSION="0.3.1-dev"
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for 
Ada/C/ObjC
but not for C++
_cpyrit_calpp_kernel.cpp: In function ‘void sha1_process(const SHA_DEV_CTX&,
SHA_DEV_CTX&)’:
_cpyrit_calpp_kernel.cpp:429: warning: suggest parentheses around arithmetic in
operand of ‘^’
_cpyrit_calpp_kernel.cpp:431: warning: suggest parentheses around arithmetic in
operand of ‘^’
_cpyrit_calpp_kernel.cpp:434: warning: suggest parentheses around arithmetic in
operand of ‘^’
_cpyrit_calpp_kernel.cpp:437: warning: suggest parentheses around arithmetic in
operand of ‘^’
_cpyrit_calpp_kernel.cpp:440: warning: suggest parentheses around arithmetic in
operand of ‘^’
_cpyrit_calpp_kernel.cpp:443: warning: suggest parentheses around arithmetic in
operand of ‘^
[...]

pyrit benchmark   cpyrit_calpp-v2b-3.tar.gz  Comment #78 fglrx 10.3
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+
Running benchmark (18748.3 PMKs/s)... \ 
Computed 19138.70 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 16674.7 PMKs/s (RTT 2.7)
#2: 'CPU-Core (SSE2)': 786.3 PMKs/s (RTT 3.0)
#3: 'CPU-Core (SSE2)': 787.5 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 790.5 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

greetings, dapi

Original comment by datapirates@googlemail.com on 21 Apr 2010 at 4:47

Attachments:

GoogleCodeExporter commented 9 years ago
I've never used setup clean so I don't know :). Warnings are ok. During 
development
there is a lot of commented code, unused code (which can be useful later), etc. 
Ok I'm sending next version. In lines 234,235 there are 2 variables 
mask_transfer and
use_pined_memory. Now both are false. This gives almost version r250 - so it
shouldn't be slower. It should be slightly faster - it masks data processing ( 
but
not transfer ) - which on 4770 gives 5-6% benefit. 

There is also one big difference between v2 and r250. Benchmark on r250 gives 
peak
performance on the other hand v2 gives sustained speed - this can also lead to 
some
confusion. 

Original comment by hazema...@gmail.com on 21 Apr 2010 at 6:27

Attachments:

GoogleCodeExporter commented 9 years ago
Here are my results using 2b-4 on a system with 2 4850s (Phenom II X4 925):

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (36695.1 PMKs/s)... - 

Computed 37837.27 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 18563.4 PMKs/s (RTT 2.8)
#2: 'CAL++ Device #2 'ATI RV770'': 18474.8 PMKs/s (RTT 2.8)

Under latest svn (251) using Calpp I was getting about 17700 PMKs/s on each, so 
there
is some improvement. I just installed the Phenom yesterday. Used to be an 
Athlon64 X2
5200+ (2.6 GHz), so I already saw some drastic improvement from that (about a 
1000
PMKs/s jump), prior to trying your v2b-4 build. The ATI drivers definitely 
depend on
the CPU power. The Phenom's a little limited by the older motherboard I'm using 
now,
which limits its HTT speed and won't let me overclock. I may try overclocking 
the
4850's later to see if I'm still being CPU limited.

Thanks for the hard work. Hopefully, we can get this in pyrit svn soon. I hate 
manual
builds.

Original comment by robert.b...@gmail.com on 21 Apr 2010 at 8:09

GoogleCodeExporter commented 9 years ago
Robert could you test 2b-4 with mask_transfer=true and use_pined_memory=true ( 
lines
234,235 in _cpyrit_calpp.cpp )

Original comment by hazema...@gmail.com on 21 Apr 2010 at 8:34

GoogleCodeExporter commented 9 years ago
I used cpyrit_calpp-v2b-2.tar.gz  of comment #73.
Compared to my previous comment #63, i report my whole system got another 1.6% 
of
PMK/s increment.
I am interested in sustained speed, what the difference between v2b-2, v2b-3 
and v2b-4?
I wish to test also v2b-3 and v2b-4 but for the moment I can't stop my 
scheduled job.
Shall I test v2b-3 or better to skip it and go directly to v2b-4? Keep in mind 
every
test cost me about 40 minutes.

Here another comment: I run passthrough in one xterm, if I open another xterm 
and do
normal activity (cp, ls, etc), PMK/s decrease of about 30%. In previous v2b-1, 
if I
stop to do job in second xterm, PMKs/s slowly go up back to 100% instead with 
v2b-2
the PMKs stay at 70% and only when bach N finish and bach N+1 starts, than PMK 
go up
to 100%.
So, it is better to set up the pc to run and leave it untouched till he finish.

Original comment by pyrit.lo...@gmail.com on 21 Apr 2010 at 8:42

GoogleCodeExporter commented 9 years ago
v2b-4 is most general - mask_transfer=false use_pined_memory=false ( for 
testing 4850s )
v2b-2 = v2b-4 with mask_transfer=true use_pined_memory=true
v2b-3 = v2b-4 with mask_transfer=false use_pined_memory=true

v2b-2 comparing to v2b-1 had some refinement in python thread management ( after
shocking discovery that python isn't really multi-threaded :/ )

Original comment by hazema...@gmail.com on 21 Apr 2010 at 8:51

GoogleCodeExporter commented 9 years ago
I think that the only useful values are mask_transfer=use_pined_memory=false and
mask_transfer=use_pined_memory=true. So If you want to test anything test v2b-4 
with
mask_transfer=use_pined_memory=false - I think it might be slightly slower on 
5xxx,
but I'm not sure.

With regard to performance decrease - it's really strange that there is 
different
behaviour between v2b-1 and v2b-2. Both versions are almost the same ( there is 
only
small rescheduling of operations ). 

Original comment by hazema...@gmail.com on 21 Apr 2010 at 9:00

GoogleCodeExporter commented 9 years ago
hazeman11, please note I used v2b-2 without change anything, pyrit  is
pyrit-calpp-v2b.tar.gz. Shall I use v2b-2 with some different version of calpp 
and/or
pyrit?

Original comment by pyrit.lo...@gmail.com on 21 Apr 2010 at 9:12

GoogleCodeExporter commented 9 years ago
All v2-x estimate sustained speed in benchmark. In short the equation is
items_done/(current_time - time_of_first_item).
So depending on how long was the slowdown, time required to go back up to 100% 
may
differ. 
If the slowdown took 10% of batch items then estimation should quite quickly go 
back
to 100%. But if the slowdown occurred during processing of 70% items then there 
is no
way it will be back to 100% ( and yet core is computing at 100% speed ).
I'm guessing this could be the case.

Original comment by hazema...@gmail.com on 21 Apr 2010 at 9:22

GoogleCodeExporter commented 9 years ago
pyrit-calpp-v2b.tar.gz is the version to use. So no problem there.

Original comment by hazema...@gmail.com on 21 Apr 2010 at 9:25

GoogleCodeExporter commented 9 years ago
ok, v2b-4 gave me +1.9% compared to v2b-2.
so from v2b to v2b-4 it is +3.7%.
Good job once more :)

Original comment by pyrit.lo...@gmail.com on 21 Apr 2010 at 10:28

GoogleCodeExporter commented 9 years ago
hazeman11,

Made your two changes. Here's the ugly results (which I assume you were 
expecting):

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (32594.7 PMKs/s)... | 

Computed 33295.89 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 16426.1 PMKs/s (RTT 2.8)
#2: 'CAL++ Device #2 'ATI RV770'': 16493.0 PMKs/s (RTT 2.7)

So yes, the two "false" entries are better for 4850s. I'll be changing mine 
back to
"false". ;)

Original comment by robert.b...@gmail.com on 21 Apr 2010 at 10:37

GoogleCodeExporter commented 9 years ago
After changing both back to false, results back to normal:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (35972.8 PMKs/s)... / 

Computed 36687.42 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 18625.6 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 18722.4 PMKs/s (RTT 2.8)

Original comment by robert.b...@gmail.com on 21 Apr 2010 at 10:42

GoogleCodeExporter commented 9 years ago
FYI - Just tried overclocking the GPUs for the heck of it. Overclocked both 
4850s
from 625 core to 650 core (didn't get greedy yet) and left the memory at 
default 993.
Here's the results using v2b-4:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (37915.0 PMKs/s)... | 

Computed 39628.46 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 19232.9 PMKs/s (RTT 2.8)
#2: 'CAL++ Device #2 'ATI RV770'': 19481.8 PMKs/s (RTT 2.8)

Not too shabby. I'll run this for a while to test stability/heat. It's nice to 
know
I'm not CPU-limited any more! With my old Athlon X2 5200+ I got literally NO 
increase
from making this same core clock change.

As an aside, I should note that I've been running Compiz for all these posted 
test
results. I turned it off to see how it would affect the benchmarks and this is 
what I
got (under stock 625 core settings):

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (36749.4 PMKs/s)... / 

Computed 37593.15 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 18753.0 PMKs/s (RTT 2.8)
#2: 'CAL++ Device #2 'ATI RV770'': 18607.4 PMKs/s (RTT 2.8)

A little boost, but not worth me giving up my wobbly windows. We all have our 
vices. . .

Original comment by robert.b...@gmail.com on 22 Apr 2010 at 12:18

GoogleCodeExporter commented 9 years ago
@ robert.blench.
Is you system stable after overclock? I wish to get also some % of PMK more :)
You did 4% of overclock, did you try some more extreme test?
how to overclock the videocard with command line tool?
What command/value to digit?

Original comment by pyrit.lo...@gmail.com on 22 Apr 2010 at 10:52

GoogleCodeExporter commented 9 years ago
Please move discussions into the group. Everyone is free to post at
http://groups.google.com/group/pyrit

Original comment by lukas.l...@gmail.com on 22 Apr 2010 at 10:57

GoogleCodeExporter commented 9 years ago
If the false,false is also best for 5xxx cards then here goes release candidate 
:).
I'm attaching version v2c. This is cleaned up version v2b-4 with hard coded
false,false. There is also small change in memory management. On my system it 
has
exactly the same speed as v2b-4 with false, false.

Please everyone test it.

Original comment by hazema...@gmail.com on 22 Apr 2010 at 2:08

Attachments:

GoogleCodeExporter commented 9 years ago
I will provide feedback tonite about cpyrit_calpp-v2c.tar.gz on HD5x80

Original comment by pyrit.lo...@gmail.com on 22 Apr 2010 at 3:55

GoogleCodeExporter commented 9 years ago
I did test. v2c has same PMKs/s than v2b-4 on 5770 and 5870.

Original comment by pyrit.lo...@gmail.com on 22 Apr 2010 at 5:55

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
v2c test
pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (124508.8 PMKs/s)... / 

Computed 125604.16 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 59829.1 PMKs/s (RTT 1.2)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 59845.4 PMKs/s (RTT 1.2)
#3: 'CPU-Core (SSE2)': 835.4 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 544.1 PMKs/s (RTT 2.8)
#5: 'CPU-Core (SSE2)': 548.8 PMKs/s (RTT 3.0)
#6: 'CPU-Core (SSE2)': 956.0 PMKs/s (RTT 3.0)

pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (118925.3 PMKs/s)... / 

Computed 121720.31 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 59533.3 PMKs/s (RTT 1.2)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 59803.2 PMKs/s (RTT 1.2)

This version is the best candidate for me.
I have done a pci-express test and I have noticed slow speed from GPU > CPU. 
Maybe
this can impact over performance ?

Original comment by odl...@gmail.com on 22 Apr 2010 at 7:11

Attachments:

GoogleCodeExporter commented 9 years ago
NOTE: In past I saw that "pyrit benchmark" not always report true PMK, so I do 
my
test (as in comment #97) in °real° mode, I mean I do "time pyrit -e ESSID -i 
list.txt
-o list.cow passthrough" and compare the results with the same command done 
with v2b-4.

Original comment by pyrit.lo...@gmail.com on 22 Apr 2010 at 7:25