ccorcos / tuple-database

406 stars 19 forks source link

feat: add support for `bigint` in tuples #25

Closed alvrs closed 12 months ago

alvrs commented 1 year ago

Currently it is not possible to use bigint in keys. We work around this limitation by serializing bigints to strings before setting them on the tuple-database, but this prevents us from filtering based on this tuple with gt, gte etc, since string comparisons are different from value comparisons (eg 2n < 10n but "2n" > "10n")

CleanShot 2023-05-13 at 15 56 25@2x
ccorcos commented 1 year ago

There are a few options you have here.

One thing I don't like about adding BigInt support natively in this library is that it isn't JSON serializable and I'm trying to stick with that as an abstraction because that means the entire database can operate over a JSON API between processes, etc. My strategy for things like dates is just to use application-level abstractions like {date: "2023-05-16"} to represent dates.

For BigInt, I think the most pragmatic thing for you to do is decide on an appropriate MAX_INT size that you will realistically ever care about. 2^64 if really fucking big. And then make an encoding for it by zero-padding.

Just for example:

1000000000000000000000000000000000000000000000000000000000n.toString(36).padStart(40)

You'll have to manage negative numbers by having a leading token and subtracting the max int from the number so they sort properly.

So here's a mini example:

1000n.toString(36).padStart(5, "0")000rs

Here's your maxInt (for length 5)

(Math.pow(36, 5) - 1).toString(36)zzzzz or 60466175n

So for -1000n, we'll invert it and throw a - in front.

"-" + (60466175n - 1000n).toString(36).padStart(5, "0")-zzz87

I'm not sure if you're familiar with twos-compliment but that's basically what we're doing here.

Notice that -2000n"-" + (60466175n - 2000n).toString(36).padStart(5, "0")-zzygf and

"-zzygf" < "-zzz87"true which is what we want...

Unfortunately BigInt.fromString is still in proposal so you'll have to do your own base36 parsing which is annoying. But otherwise you're pretty much good to go with this approach. Only downside is you have to specify a MAX_INT so it isn't entirely unbound. And every BigInt will consume the same amount of bytes in memory regardless of how big the int is. But that should probably be fine.

Let me know if you need any more help. 😄

ccorcos commented 1 year ago

Also if you implement something, I'd be happy to add it as a helper in this library for others.