Closed agentgt closed 2 years ago
Hi, @agentgt!
I really liked the libdivide4j
library. But unfortunately I can't add a dependency for the specific case.
However I added a new parameter in BaseNCodec
for you to use FastDivision
. All you need to do is inject a function that uses FastDivision
into a codec.
CustomDivider
is a division function interface that returns quotient and remainder. It can be used to boost encoding speed utilizing specialized libraries like libdivide
.
The addition of CustomDivider
did not affect performance. In fact, the refactoring done to include it improved performance by 15% for decoding and 8% for encoding.
Here is a simple function example:
CustomDivider divideBy64 = x -> new long[] { x / 64, x % 64 };
Here is an complete codec example using FastDivision
:
public class FastBase62Codec extends BaseNCodec {
private static final int RADIX = 62;
private static final BaseN BASE_N = new BaseN(RADIX);
// make the magic happen with fast division :)
private static final FastDivision.Magic MAGIC = FastDivision.magicSigned(RADIX);
private static final CustomDivider DIVIDER = x -> {
long quotient = FastDivision.divideSignedFast(x, MAGIC);
long remainder = x - quotient * MAGIC.divider;
return new long[] { quotient, remainder };
};
// a shared immutable instance
public static final FastBase62Codec INSTANCE = new FastBase62Codec();
public FastBase62Codec() {
super(BASE_N, DIVIDER);
}
}
Benchmark before improvement:
-----------------------------------------------------------------
Benchmark Mode Cnt Score Error Units
-----------------------------------------------------------------
Base62Codec.decode() thrpt 5 8667,315 ± 246,583 ops/ms
Base62Codec.encode() thrpt 5 1310,038 ± 30,726 ops/ms
-----------------------------------------------------------------
Benchmark after improvement:
-----------------------------------------------------------------
Benchmark Mode Cnt Score Error Units
-----------------------------------------------------------------
Base62Codec.decode() thrpt 5 9964,451 ± 99,106 ops/ms +15%
Base62Codec.encode() thrpt 5 1421,416 ± 41,249 ops/ms +8%
FastBase62Codec.encode() thrpt 5 3497,924 ± 135,835 ops/ms +167%
-----------------------------------------------------------------
As you can see FastBase62Codec
improved the performance by 2.5x. Thank you so much!
Released version v4.6.0.
The library is only one class so you might be able to just copy the class.
The algorithm believe is in the public domain and I didn’t see a license for either the c or java version.
Making a copy of libdivide4j
is also adding a dependency. I intend to keep this lib self-contained and efficient at the same time, although I know it's an unattainable ideal as nothing is really independent. Alternatively, I could just copy part of libdivide4j
, but I'm not confident in adding part of something I can't fully understand. libdivide
is really magic for me.
However, if a developer needs to speed up performance with libdivide4j
, this can be done by injecting a custom division function. It can be considered an option. I didn't notice any significant performance difference when plugging FastDivision
wrapped in a CustomDivider
. I think the JIT compiler does a pretty good job of optimizing it for us. Please take a look at this benchmark I just did:
Benchmark Mode Cnt Score Error Units
InjectedWithCustomDivider thrpt 5 3156,578 ± 23,137 ops/ms
IncludedAsDependency thrpt 5 3225,171 ± 99,144 ops/ms +2%
I really appreciate the interest and advice other developers give this library. But I respectfully refuse to add libdivide4j
as a dependency.
I respect that and appreciate it. I wish more libraries did that. I was only pointing it out as you might have thought it was a ton of code but its only one class. If en/decoding was the primary goal of this project I would make a bigger deal about it but its not.
EDIT: To be clear I agree with your decision 👍
By using this libdivide4j I was able to double the speed of your BaseN encoder for non powers of 2.
I don't really fully understand how the algorithm works but it speeds up divmod-ing quite a bit on non power of 2s. I also don't understand why you modulus (
%
) after you already done the division. Maybe the JIT is smart enough. Doing multi and then subtracting (like I did here) might speed it up as well.I also have an in house encoder that I call Fast57 (but it really could be any BaseN) that divmods the MSB long and the LSB long separately using the fast division algorithm and then concatenates them. That approach is about twice as fast as even the improved BaseN encoder from the above however it can't do arbitrary lengths of bytes since it relies on longs.