andremussche / scalemm

Fast scaling memory manager for Delphi
https://code.google.com/p/scalemm/
Other
98 stars 22 forks source link

Optimize.Move very slow used on AMD processors with SSE/SSE2/SSE3 #18

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi. 

I'm a Delphi programmer.
I use Windows XP x86 + Delphi 7 and Windows 7 x86 + Delphi XE5.
CPU: AMD Athlon(tm) II x2 3.2 GHz (x86, x86-64, MMX, 3DNow!, SSE, SSE2, SSE3)
Optimize.Move version: from ScaleMM 2.41

I recently found out about Optimize.Move and I liked it right away. It's so 
fast...
But later I found situations when it is a lot slower than the classic Move (on 
my processor).

For example:

var
   s: AnsiString; 
   ab: array of Byte; 

implementation 

procedure Test; 
var 
   i: Integer; 
   t: TTime; 
begin 
   SetLength(s, 10485780); 
   SetLength(ab, 1048578); 
   t := Now; 
   for i := 1 to 500 do 
   begin 
      Move(s[11], s[1], 10485760); 
      Move(ab[2], ab[1], 1048576); 
   end; 
   t := Now - t; 
   ShowMessage(FormatDateTime('ss.zzz', t)); 
end;

On Intel processors this code is only 1.08x faster when using Optimize.Move 
than the classic Move. 
On my processor I get ~10 sec with Optimize.Move and ~2.27 sec with classic 
Move. 
This kind of memory operations I use a lot in my code when deleting/resizing 
items from a custom string array. I could show you the code if you want. 

The original discussion about Optimize.Move and the problem: 
http://www.delphipages.com/forum/showthread.php?t=216340 

I hope these informations will help you track the problem easily. But, if you 
need more, I'm glad to help. 

Regards, 
David

Original issue reported on code.google.com by david.br...@gmail.com on 8 Apr 2014 at 7:29

GoogleCodeExporter commented 9 years ago
maybe it has something to do with data alignment? so only use the fast move if 
(both?) pointers are 8byte/16byte aligned?

(@s[11] AND 7 = 0)  8byte 
(@s[11] AND 15 = 0) 16byte aligned

Original comment by andre.mussche on 8 Apr 2014 at 7:40

GoogleCodeExporter commented 9 years ago
When I move from 2 to 1 it's the slowest (10.8 sec).
From 2 >> 1 to 32 >> 1 it decreases to 8.4 sec. At 33 >> 1 it switches to 3.1 
sec.

Original comment by david.br...@gmail.com on 8 Apr 2014 at 8:18

GoogleCodeExporter commented 9 years ago
I have tested on another computer with an AMD processor (SSE3).
The problem is the same, meaning it's a very big chance to appear on ANY 
computer with AMD processor.

Original comment by david.br...@gmail.com on 10 Apr 2014 at 6:22