short string optimization

mppf commented 5 years ago

Many string API routines currently return a string to represent one character.

For example:

It is nice for API design reasons to keep these methods within the string type. However, that currently comes at a significant performance penalty vs. returning an integer to represent the character. One significant element of this performance penalty is the necessity to allocate and free these strings.

However, there is an optimization strategy used in other languages that can be applied to this case. It's called the short string optimization. In Chapel terms, the short string optimization would be to make the string record able to store short strings directly in the record instead of in a pointed-to buffer. This is normally done by re-using storage for other record fields that aren't used when the string is stored directly (e.g. with a C union).

The Chapel string record is currently 32 bytes long. That means that it would be possible to store 1 byte of tag, that indicates if the format of the 32 bytes is "string included" or "points to buffer" and, in the event of "string included", stores the length of the string. Even if short strings are always null terminated, this allows for 30 bytes of string data to be stored in the string.

This optimization additionally makes it more likely that a string can be communicated in 1 GET or PUT when Chapel programs are referring to remote strings. (We do RVF short strings already, but RVF is not always possible).

I havn't done a study of string lengths used in Chapel programs in practice but I'm fairly confident that a large fraction of these would be < 30 bytes.

If I were implementing the short string optimization today, what would it look like?

create a C struct type to represent the Chapel string, have it store a union within it. Why do this in C? Because the short string optimization will rely on some C struct properties that aren't necessarily true for Chapel records (in particular, the fields are stored contiguously in memory except for padding. In Chapel, one day optimization might store these fields interleaved with other data, e.g. Array of Structure to Structure of Arrays optimization).
the C struct type is opaque to Chapel code
inline C methods allow access to the 5 fields (len, size, buff, isowned, locale_id)
when retrieving the buff field for a short string, the pointer returned would simply be a pointer to the data store in the struct itself.
paren-less getter methods could even be used to access these fields by calling the appropriate C functions in the event that they are not changed

This strategy would allow most of the string methods to work with modest updates. The string initialization and append code would need more logic to decide between representing the string as "string included" or "points to buffer". This adds some complexity to the implementation.

mppf commented 5 years ago

As an amusement, I tried a prototype to see if it sped up some benchmarks.

https://github.com/mppf/chapel/tree/prototype-short-string-opts

It includes two benchmarks it helps with. More work would be required to study further (e.g. pass full local testing).

The first approach, using a C union, did not help so much with the sorting strings benchmark. The later approach, just leaving the fields there but making a pointer to one or more of them, went OK. But that approach would require us to label that "bufferType" to be possibly overlapping in the LLVM back end (the way char* can alias anything in C).

mppf commented 5 years ago

Another interesting benchmark would be a push_back test where many strings are added to an array.

chapel-lang / chapel

short string optimization #11221