cisco / ChezScheme

Chez Scheme
Apache License 2.0
6.89k stars 983 forks source link

extend `vector-copy` and add `vector-append` and `vector-set/copy` #785

Closed mflatt closed 6 months ago

mflatt commented 6 months ago

Make vector-copy like substring when start and end positions are supplied. Internally, allocate the vector without initializing memory, since the content is immediately replaced by copying, and avoid using write barriers, since the vector copy is freshly allocated.

mflatt commented 6 months ago

One potential issue here is that string-copy! and bytevector-copy! accept a start position and count, instead of a start position and end position. Those functions also take separate source and destination starting positions, though, and a length makes more sense in that context. This vector-copy accepts arguments more in line with substring, but taking a count like string-copy! would also make sense.

By avoiding write barriers and redundant initialization, this vector-copy makes the following example run about twice as fast on my machine, or about 3 times as fast in unsafe mode:

 (let ([vec '#(apple banana coconut)])
   (let loop ([i 1000000] [r #f])
     (if (= i 0)
         r
         (loop (fx- i 1) (vector-copy vec)))))
mflatt commented 6 months ago

There are several problems with the current implementation. I'll try again.

mflatt commented 6 months ago

The newest commit implements vector-copy in a better way, and it adds vector-append.

Overall, the intent is to provide better primitives for implementing a persistent data structure like an RRB tree. Those kinds of data structures need to frequently copy and update a vector, sometimes adding or removing slots. The vector-append operation here is specialized for up to 3 vectors with those kinds of operations in mind. Stencil vectors serve a similar role, but plain vectors can be easier to use when mask bits are not needed.

mflatt commented 6 months ago

After trying this out more, I think it's worth adding vector-update. The performance benefit of vector-update compared to vector-copy plus vector-set! is very small, but it rounds out the available functional operations for the things that I have tried.

jltaylor-us commented 6 months ago

One potential issue here is that string-copy! and bytevector-copy! accept a start position and count, instead of a start position and end position. Those functions also take separate source and destination starting positions, though, and a length makes more sense in that context. This vector-copy accepts arguments more in line with substring, but taking a count like string-copy! would also make sense.

I'd lean towards making it like string-copy! due to the similarity in the names. (That also avoids any confusion about whether "end" is inclusive or exclusive.)

mflatt commented 6 months ago

I can change vector-copy to take a length as the third argument.

Any opinion on the name vector-update? As brought up in racket/racket#4879, it's not quite "update" as used for a function like hashtable-update!, where it means a function converts the current value at a given key to another value. Alternate suggestions so far are vector-copy-set and vector-set/copy, and let's add vector-set-copy also on the grounds that the function is more like vector-set! than vector-copy.

(I picked vector-update based on stencil-vector-update, but the connection is weak. I don't think stencil-vector-update could change in Racket, since its been in several releases, but we could pick different name for it in Chez Scheme for stencil-vector-update the terminology collision seems bad.)

jltaylor-us commented 6 months ago

no, not really. Part of me is tempted to say it's just vector-set without the "bang", but I'm certain that would cause confusion with new users or make for hard to debug typos. Something like vector-set/copy or vector-copy-and-set is a bit unwieldy but does convey what's going on. The /copy name seems more scheme-like to me, but that might just be that I've spent too long in languages with impoverished identifier syntax that don't allow / in names.

mflatt commented 6 months ago

Thanks! The latest version uses the name vector-set/copy.

mflatt commented 6 months ago

Latest version has specializations for (vector-copy (vector x) y) and (vector-copy x (vector y)) in unsafe mode, because adding one element to the front or end of a vector is a common operation. There's no end to special cases that could be recognized, but this one gives a 15% performance boost on microbenchmaks for our RRB tree implementation.