Closed cespare closed 5 years ago
I glossed over the asm, but this looks pretty good to me. I do have some new benchmarks up my sleeve. I'm guessing purego didn't get too much slower?
In the initial version purego benchmarked 80% slower for a 10 MB Write. Turned out the code incurred some unwanted bounds checks. I fixed that problem and now it's only a little slower (0-10%) due to (AFAICT) minor churn in the register allocation (it's a pretty sensitive tight loop). Anyway, since that's the pure go version, I'm fine with that.
This still needs work; the purego one got slower. It also needs a benchmark (I didn't include the quick'n'dirty one I used to measure the results because I think @ongardie-ebay has added some benchmarks for his incoming changes.)
It has a nice side effect of bypassing reslicing gotcha in the assembly.
Updates #13.