Gwinel / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Thread spawning/joining via Windows 64-bit tcmalloc is very slow on our 16-core server. #568

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Extract the supplied attached tarball. Contained are instructions and an 
example C++11 program that highlight the issue.
2. Use Visual C++ 2010 Express Edition and Windows 8 SDK May 2010
to compile the 64-bit test program along with a 64-bit tcmalloc
3. Run the test program on a 16-core (or higher) Windows server.

What is the expected output? What do you see instead?

I expect the creation and joining of 90,000 threads to take about 4-20 seconds. 
But on our 2007 era 16-core Windows 2008R2 server it takes 10 minutes to 
spawn/join 90,000 threads. This only happens with the Windows 64-bit version of 
tcmalloc. This happens with version 2.0 and 2.1 of tcmalloc, but version 1.8.3 
(+win64 patch) does not have the performance degradation.

What version of the product are you using? On what operating system?

Version 2.0 & 2.1 of tcmalloc exhibit the Win64 performance degradation. The 
performance degradation appears to only happen on our 2007 era high-core count 
server. We don't have access to newer high core count Windows servers. So I 
don't know if this issue is repeatable. I'm hoping the tcmalloc developers will 
be able to confirm this on the Windows servers they have access to.

Please provide any additional information below.

The Windows 32-bit version of tcmalloc does not exhibit the issue. The Linux 
version of tcmalloc does not exhibit the issue. Running the example program on 
lower core count Win64 machines does not highlight the issue. High core count 
or large RAM size may be a factor.

Dennis.

Original issue reported on code.google.com by dennisb...@gmail.com on 26 Aug 2013 at 3:38

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately I don't have access to such box. Can you elaborate exactly this 
machine is ? Number of sockets, model of CPU etc.

Also may I ask you to try releases before 2.0 and 1.8.3? Be localizing 
regression we might be able to fix it earlier.

Original comment by alkondratenko on 29 Aug 2013 at 1:00

GoogleCodeExporter commented 9 years ago
Please close this issue as non-reproducible.

Very weird, I can't repeat the slowness via the attached test case. I installed 
"Windows 2012" on our server (the one mentioned above); no thread creation 
speed issue. I then restored "Windows 2008R2" via Clonezilla and I can't 
reproduce the slowness there either.

Secondly, and even more confusing, our database server when using tcmalloc 2.1 
is still much slower when compared with 1.8.3. I just have no idea what the 
issue is, I thought it was thread creation, maybe not. 

However, that is my problem. I need to come up with a test case.

Please close this.

If/when I have a good test case I will be back.

Original comment by dennisb...@gmail.com on 2 Sep 2013 at 12:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I know what the real issue is.

The fix for  issue 443  has killed our performance on Windows 64-bit.

TCMALLOC_TRANSFER_NUM_OBJ used to be 32, now it is 32,768.

When doing heavy multi-threaded loadings on own database server we get the 
following performance:

TCMALLOC_TRANSFER_NUM_OBJ=40     --> 12mins to handle 20 users
TCMALLOC_TRANSFER_NUM_OBJ=32,768 --> 2hours 20mins to handle 20 users

The new default value of TCMALLOC_TRANSFER_NUM_OBJ kills tcmalloc performance 
for us on Windows 64-bit. I'm currently doing more testing (including Linux); 
when I have more details I'll create a new issue report.

Please close this request, it is a dead end.

Dennis.

Original comment by dennisb...@gmail.com on 4 Sep 2013 at 1:19

GoogleCodeExporter commented 9 years ago
Thanks a lot for looking at this. I'm looking forward seeing results of your 
investigation

Original comment by alkondratenko on 9 Sep 2013 at 2:58