In most cases babeld is checking for eq or non-equal. memcmp (especially for 16 byte values) is
really innefficient for these cases.
In the babeld-xnor branch I've got rid of most of the calls to memcmp, and especially on 64 bit arches,
instead of a big call to memcmp, things get replaced with two xors and an or call, which take, oh, 3 cycles? to execute. This makes it a lot easier to profile for the calling sites that are inefficient.
In most cases babeld is checking for eq or non-equal. memcmp (especially for 16 byte values) is really innefficient for these cases.
In the babeld-xnor branch I've got rid of most of the calls to memcmp, and especially on 64 bit arches, instead of a big call to memcmp, things get replaced with two xors and an or call, which take, oh, 3 cycles? to execute. This makes it a lot easier to profile for the calling sites that are inefficient.