cynthia / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

tcmalloc forces to 16 byte alignment #430

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

We have implemented a private heap in our application
to reduce heap management overhead to the minimum. 
The problem is that we allocate a lot of 24 byte objects
and we do not want to waste memory. 

We found out that your gperftools package also avoids the 8 byte malloc 
overhead and we have tested this as an alternative, but discovered 
that it does not reduce the memory like we expected, 
because in 'common.cc' we have the following: 

    int AlignmentForSize(size_t size) {
      int alignment = kAlignment;
      if (size > kMaxSize) {
        // Cap alignment at kPageSize for large sizes.
        alignment = kPageSize;
      } else if (size >= 128) {
        // Space wasted due to alignment is at most 1/8, i.e., 12.5%.
        alignment = (1 << LgFloor(size)) / 8;
      } else if (size >= 16) {
        // We need an alignment of at least 16 bytes to satisfy
        // requirements for some SSE types.
        alignment = 16;
      }
      // Maximum alignment allowed is page size alignment.
      if (alignment > kPageSize) {
        alignment = kPageSize;
      }
      CHECK_CONDITION(size < 16 || alignment >= 16);
      CHECK_CONDITION((alignment & (alignment - 1)) == 0);
      return alignment;
    }

This seems to force us to use 16 byte alignment, what gives in 
our case 32 bytes for each internal 24 bytes allocation (waste of 25%). 

Are we correct that this 16 byte alignment should be used when 
you want to improve the performance when using SSE operations. 
Or is there as well an other reason ?

We googled a bit but we can not find the more info about the 
reason why this was added, we found : 

http://code.google.com/p/gperftools/source/detail?spec=svn60&r=60

We implemented a change, based on gperftools-2.0, so that when 
you compile with the switch '-DTCMALLOC_ALIGN_8BYTES'
we disable the 16 byte alignment and we use 8 byte alignment instead. 

Is this safe to do, or are there consequences that we missed ?

Note that we have not measured performance degradation by using this
patch (perhaps a consequence of running in 32-bit mode [ gcc -m32 ]
on x86_64 architecture), and note that glibc malloc is only providing
8-byte aligned objects by default.

Would you accept to integrate this patch ?

Alternatively would you accept to provide an interface that permits
the caller to specify alignment requirements explicitly ?

The patch we've applied to solve our problem is attached, created on latest 
gperftools release 2.0.
(attachment: gperftools-2.0_8ByteAlignment.patch)

P.S. Most of our code is in Ada so for us the ideal interface would match
what the compiler expects :

    procedure Allocate(
      Storage_Address : out Address;
      Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
      Alignment : in Storage_Elements.Storage_Count) is abstract;
    procedure Deallocate(
      Storage_Address : in Address;
      Size_In_Storage_Elements : in Storage_Elements.Storage_Count;
      Alignment : in Storage_Elements.Storage_Count) is abstract;

This interface leaves the responsibility for determining size and alignment
requirements to the caller, both in case of allocation and de-allocation
(because in many cases the size of the object is static and does not require
storage).

Original issue reported on code.google.com by koen.mee...@gmail.com on 14 May 2012 at 7:33

Attachments:

GoogleCodeExporter commented 9 years ago
Looks good. Certainly makes sense to minimize internal fragmentation for 
systems that don't need the additional alignment. I don't think that we want to 
provide an API call for allowing user specified alignment though. The alignment 
is mandated by the target platform and not on a case by case basis.

Original comment by chapp...@gmail.com on 15 May 2012 at 1:33

GoogleCodeExporter commented 9 years ago

Original comment by chapp...@gmail.com on 15 May 2012 at 1:33

GoogleCodeExporter commented 9 years ago
Do you have plans to include this in one or other way in the next release? 

>> I don't think that we want to provide an API call for allowing user specified
>> alignment though. The alignment is mandated by the target platform and
>> not on a case by case basis.

The statement above seems to be based on the assumption that we plan to give 
the responsibility for alignment requirements to the programmer, which is not 
at all our intent. In our case (Ada technology) the COMPILER has calculated the 
alignment requirements and has generated a call to the allocator function 
specifying both size and alignment requirements. For a different target 
platform it will generate (possibly) different values.

To compare with C technology you might imagine that one day gcc implements 
alignmentof(X), just like it implements today sizeof(X), such that in this 
hypothetical future a programmer could call a function aligned_tcmalloc 
(sizeof(X), alignmentof(X)) and this way avoid aligning everything to 8 bytes.

This would be open to abuse, if a programmer hard-codes the second parameter 
this would create non-portable code, just like a programmer calling malloc with 
a hard-coded size parameter. A fool-proof heap interface is not feasible 
(without paying for garbage collection), there are many ways to shoot yourself 
in the foot. The programmer's mistakes are a job for valgrind. 

Size and alignment requirements are both "known" to gcc, regardless of the 
programming language, as they are needed to properly allocate variables on the 
stack or components in a struct. The only difference is that in C, size was 
made visible via the sizeof() construct (as otherwise calling malloc would be a 
nightmare), while alignment was not made visible (only because it creates less 
of a nightmare, wasting up to 15 bytes per object is not a catastrophy).

For a large mission-critical system memory efficiency represents a significant 
development cost, and wasting memory represents a significant cost, so the fact 
that people created a wasteful design in the past based on "not a catastrophy", 
is not a reasonable justification to continue the waste.

We hope the above is a better justfication for adding alignment support to the 
heap API.

Original comment by koen.mee...@gmail.com on 13 Jul 2012 at 7:15

GoogleCodeExporter commented 9 years ago
I think your argument has many good points that would justify this API 
addition. It should make the next release.

Original comment by chapp...@gmail.com on 24 Jul 2012 at 4:04

GoogleCodeExporter commented 9 years ago
r175 | chappedm@gmail.com | 2012-11-04 13:15:11 -0500 (Sun, 04 Nov 2012) | 2 
lines

issue-430: Introduces 8-byte alignment support for tcmalloc

Original comment by chapp...@gmail.com on 4 Nov 2012 at 6:16