Closed jberryman closed 9 years ago
Hi @jberryman!
I don't really understand this use case. Are you asking if there is a way to count the number of bytes that a particular BufferBuilder would produce? Or to render a BufferString into a fixed-size buffer? Or both? :)
There are other cases where counting length would not be O(1): UTF-8 strings, decimal encoding of numbers, and things like that.
If my above interpretation of what you're asking for is accurate, I'm definitely open to having BufferBuilder solve your use case. BufferBuilder is intended to be practical and useful, after all. :)
Are you asking if there is a way to count the number of bytes that a particular BufferBuilder would produce?
Right, this one. Imagine I have a very big ByteArray and an atomic counter, and I would like many concurrent writers to be able to write their Builder to this ByteArray. I need to know the number of bytes the Builder needs. Then each writer atomically increments the counter by this number and everyone can get their own section of the array to run their builder on. On May 30, 2015 11:17 PM, "Chad Austin" notifications@github.com wrote:
Hi @jberryman https://github.com/jberryman!
I don't really understand this use case. Are you asking if there is a way to count the number of bytes that a particular BufferBuilder would produce? Or to render a BufferString into a fixed-size buffer? Or both? :)
There are other cases where counting length would not be O(1): UTF-8 strings, decimal encoding of numbers, and things like that.
If my above interpretation of what you're asking for is accurate, I'm definitely open to having BufferBuilder solve your use case. BufferBuilder is intended to be practical and useful, after all. :)
— Reply to this email directly or view it on GitHub https://github.com/chadaustin/buffer-builder/issues/7#issuecomment-107070595 .
That seems quite doable! I'm imagining:
getBufferBuilderLength :: BufferBuilder () -> Int
I think I prefer option 2. It adds some branch overhead in the C code, but those branches could be made quite predictable, and they could be made to leave the non-counting case super fast. Perhaps the branches could be placed inside of this branch somehow: https://github.com/chadaustin/buffer-builder/blob/master/cbits/buffer.cpp#L28
Option #1 sounds a lot harder to optimize on the Haskell side.
@andyfriesen, what do you think about all of this?
So is the theoretical overhead of calculating lengths basically just the cost of an extra traversal of certain data e.g. those primitive Addr#
string literals (I notice String
uses mapM_
, so no extra traversal there)? And the possible upside is that you never need to do a memcpy to grow the buffer?
I'm just starting to work through the code, sorry if those are dumb questions.
I wonder what sorts of benefits could be had from a quasiquoter that could do pre-processing of string literals at compile-time, either to ByteString
s or perhaps into a primitive string literal with length safely calculated. I know e.g. in http-client
there's a big chunk of code that's mappend
-ing string literals with other bits of data; things desugaring to e.g.
(fromByteString $ fromString "Foo: ") <> fromByteString s <> (fromByteString $ fromString "\n")
This sort of thing is very common. I think the best we can hope for the above is that the expressions in parens become CAFs, but I wonder if a quasiquoter could result in better code.
Hi Brandon,
Jeez, sorry for my slow response. I've been swamped with things and I guess I let time pass by.
The overhead of calculating lengths is indeed another traversal. For things like ByteString, the traversal is cheap: just unpack the length integer and add it in. For things like Text you need to count what the UTF-16 transcoded into UTF-8 would look like. Integers and floats are also a bit annoying.
The upside is indeed that you'd never even need to allocate or memcpy to grow.
I saw a presentation by Andrei Alexandrescu where he says that, in may cases, saving on reallocations by paying an extra traversal is well worth it, so I'd love to see.
I don't know if a quasiquoter would help at all. We found the only way to beat CAFs is by using MagicHash string literals, but those aren't always great, because they have a trailing zero instead of a known length, so you have to strlen
...
Let me know if you have more questions!
@jberryman In context of #8, while I had this in my head, I went ahead and implemented this: https://github.com/chadaustin/buffer-builder/commit/c62bac3622b81fa2f8d05b356f24b0fcf439e341
Does this help your use case? If so, I'll upload to Hackage.
Yes, calculateLength
seems to do it for me. Thanks!
On Sat, Jul 11, 2015 at 9:45 PM, Chad Austin notifications@github.com wrote:
@jberryman https://github.com/jberryman In context of #8 https://github.com/chadaustin/buffer-builder/issues/8, while I had this in my head, I went ahead and implemented this: c62bac3 https://github.com/chadaustin/buffer-builder/commit/c62bac3622b81fa2f8d05b356f24b0fcf439e341
Does this help your use case? If so, I'll upload to Hackage.
— Reply to this email directly or view it on GitHub https://github.com/chadaustin/buffer-builder/issues/7#issuecomment-120677525 .
OK, new version on hackage! http://hackage.haskell.org/package/buffer-builder-0.2.4.0
Let me know if I could help in any other way.
Sorry that title isn't very clear. My use case is described here. Essentially I need to be able to do something like:
bytestring
'sBuilder
almost gets me there with the internalBufferWriter
, but unfortunately we can determine the number of bytes that will be written ahead of time from just theBuilder
. This is a hard requirement for me.buffer-builder
looks like an attractive place for this functionality to live because I think nearly all the instances you provide have O(1) length already, and it looks like your library is super fast to boot. I think only lazy bytestrings and theString
case would need to force the spine to get an O(n) length, which needs to happen anyway.Does that analysis seem right, and make sense? And would you be open to:
BufferBuilder
such that its allocation area is known before runningBufferWriter
-like function that I can run on a pre-allocated bufferLet me know what you think, and I may hack on this in the next couple months if it's the sort of thing you'd be interested in having here.