I have been messing with sds recently, and I decided to submit my changes.
This is a summary of the changes:
The biggest feature is the new sdsadd macro. By far the best thing about this blob:
sds s = sdsempty();
s = sdsadd(s, "Hello");
s = sdsadd(s, "world");
s = sdsadd(s, '!');
s = sdsadd(s, ' ');
s = sdsadd(s, 1337);
puts(s);
Output> Hello, world! 1337
This is just like the += operator you find in Java/JavaScript/Ruby/std::string/QString/etc.
It doesn't work with all compilers, but it has three implementations which have decent coverage:
C11's _Generic
C++11's overloading and type_traits
GCC/Clang's __builtin_types_compatible_p and __builtin_check_expr extensions
Even if you can't use a compiler with at least one of those features, there are also the manual functions:
sds s = sdsempty();
s = sdscat(s, "Hello");
s = sdscat(s, "world");
s = sdsaddchar(s, '!');
s = sdsaddchar(s, ' ');
s = sdsaddint(s, 1337);
puts(s);
Output> Hello, world! 1337
Note that sdsadd will try its best to detect character literals, but it is impossible to catch them all without ridiculous compiler-dependent voodoo. The way to get sdsadd to recognize things is to make sure it is either explicitly the char type (either via a declaration or cast, signed/unsigned char are not guaranteed to work), or if the expression in the second macro argument lexically starts or ends in a single quote.
The macro that detects the latter case is this:
Note: that check is optimized away at compile time, so don't worry about extra runtime checks.
As long as your statement matches that and is convertible to char, int, or unsigned int, it will match. It works about 90% of the time.
Other features:
ABI has been broken again, but that will probably not happen again because most things are fairly futureproof. The only major changes are that you will have to make sure all sds functions have their return value taken, regardless of whether they reallocate or not.
That is much easier to spot, because thanks to the macros SDS_MUT_FUNC, SDS_INIT_FUNC and friends, if you compile with GCC or Clang, they will warn you if you forget to take the return value. It also warns about fishy printf strings for sdscatprintf.
Full compatibility with a C++11 compiler, with some extra std::string conversion functions when you compile with it.
Removal of SDS_TYPE_5. Before you say, "oh now why did you remove that?", look at my reasoning:
It actually made performance slower. The two bytes you may have saved is offset by the extra time, code, and cache misses it takes to parse it.
It makes operations more complicated.
It requires us to reallocate whenever we increase the length.
It forces us to use three bits for the flags byte, which could be used for other things like actual flags.
It only gives us 32 characters before it is useless.
Its removal allows us to change this ugly mess:
static inline size_t sdslen(const sds s) {
unsigned char flags = s[-1];
switch(flags&SDS_TYPE_MASK) {
case SDS_TYPE_5:
return SDS_TYPE_5_LEN(flags);
case SDS_TYPE_8:
return SDS_HDR(8,s)->len;
case SDS_TYPE_16:
return SDS_HDR(16,s)->len;
case SDS_TYPE_32:
return SDS_HDR(32,s)->len;
case SDS_TYPE_64:
return SDS_HDR(64,s)->len;
}
return 0;
}
SDS_HDR_LAMBDA takes an sds string, followed by the block.
SDS_HDR_LAMBDA_2 takes an sds string, followed by the flags byte, then the block. It is safe to pass NULL as the first argument, in which sh will also be NULL
Note that the macro does wrap a switch statement, so don't use break, and try to do as many things as possible because calling it is rather expensive.
Performance improvements
This is most noticable on 32-bit, especially ARMv7 which gets up to a 10x speedup simply by using int instead of long long whenever possible. (from a benchmark of sdsll2str, renamed to sdslonglong2str on an LG G3).
restrict pointers prevents wasting time with aliasing
Using __builtin_expect macros to make the CPU not waste its time on unlikely checks.
sdscatprintf is faster, instead of guessing what allocation size it has to use, it uses the return value of vsnprintf to directly retrieve the size it needs.
More specific addition functions prevent expensive calls to sdscatprintf.
Some other annotations allow the compiler to optimize the code.
SDS_HDR_LAMBDA makes it faster (or slower, depending on how many times you use it) to do multiple changes to the string's header at once.
sdssetlen, sdssetalloc, etc are now safer. They will reallocate when necessary.
Code is more portable, fixing some issues with MSVC, and removing the flags and data documentation members from the sdshdrs makes it so we don't need __attribute__((__packed__)) or VLA support. I also tried to make things (mostly) happy with C90.
Some duplicated code blocks are now expanded macros.
Tests are now in their own file, and there are now up to 75 different tests, dependending on what compiler you use.
Some minor bugfixes
Defining SDS_ABORT_ON_ERROR will make sds abort on an error (with a message to stderr giving the line) instead of returning NULL.
Some other features, such as sdssplit which just runs sdssplitlen with strlen, and support for int types instead of just long long.
Documentation is coming soon, as well as more pedantic error checking (a debug mode?)
I have been messing with sds recently, and I decided to submit my changes.
This is a summary of the changes:
The biggest feature is the new
sdsadd
macro. By far the best thing about this blob:This is just like the
+=
operator you find in Java/JavaScript/Ruby/std::string/QString/etc. It doesn't work with all compilers, but it has three implementations which have decent coverage:_Generic
type_traits
__builtin_types_compatible_p
and__builtin_check_expr
extensionsEven if you can't use a compiler with at least one of those features, there are also the manual functions:
Note that
sdsadd
will try its best to detect character literals, but it is impossible to catch them all without ridiculous compiler-dependent voodoo. The way to get sdsadd to recognize things is to make sure it is either explicitly thechar
type (either via a declaration or cast,signed
/unsigned char
are not guaranteed to work), or if the expression in the second macro argument lexically starts or ends in a single quote. The macro that detects the latter case is this:Note: that check is optimized away at compile time, so don't worry about extra runtime checks.
As long as your statement matches that and is convertible to char, int, or unsigned int, it will match. It works about 90% of the time.
Other features:
SDS_MUT_FUNC
,SDS_INIT_FUNC
and friends, if you compile with GCC or Clang, they will warn you if you forget to take the return value. It also warns about fishy printf strings forsdscatprintf
.SDS_TYPE_5
. Before you say, "oh now why did you remove that?", look at my reasoning:to this clean syntax:
SDS_HDR_LAMBDA
andSDS_HDR_LAMBDA_2
macros run the block of code you give it automatically. It gives you the following:SDS_HDR_LAMBDA
takes an sds string, followed by the block.SDS_HDR_LAMBDA_2
takes an sds string, followed by the flags byte, then the block. It is safe to passNULL
as the first argument, in whichsh
will also beNULL
Note that the macro does wrap a switch statement, so don't usebreak
, and try to do as many things as possible because calling it is rather expensive.int
instead oflong long
whenever possible. (from a benchmark ofsdsll2str
, renamed tosdslonglong2str
on an LG G3).restrict
pointers prevents wasting time with aliasing__builtin_expect
macros to make the CPU not waste its time on unlikely checks.sdscatprintf
is faster, instead of guessing what allocation size it has to use, it uses the return value ofvsnprintf
to directly retrieve the size it needs.sdscatprintf
.SDS_HDR_LAMBDA
makes it faster (or slower, depending on how many times you use it) to do multiple changes to the string's header at once.sdssetlen
,sdssetalloc
, etc are now safer. They will reallocate when necessary.flags
anddata
documentation members from thesdshdr
s makes it so we don't need__attribute__((__packed__))
or VLA support. I also tried to make things (mostly) happy with C90.SDS_ABORT_ON_ERROR
will make sds abort on an error (with a message to stderr giving the line) instead of returningNULL
.sdssplit
which just runssdssplitlen
withstrlen
, and support forint
types instead of justlong long
.Documentation is coming soon, as well as more pedantic error checking (a debug mode?)