GlareDB / glaredb

GlareDB: An analytics DBMS for distributed data
https://glaredb.com
GNU Affero General Public License v3.0
550 stars 36 forks source link

Chore: decision on slt as part of glaredb, or separate binary #2378

Closed universalmind303 closed 4 months ago

universalmind303 commented 4 months ago

Description

https://github.com/GlareDB/glaredb/pull/2363

I think the only thing we didn't decide on in this PR was whether or not to include slt in the binary, or keep it separate.

My vote is for moving it in and feature flagging it to either a new feature, or !release

tychoish commented 4 months ago

I'm a vote for:

scsmithr commented 4 months ago

I don't have a strong opinion here.

If we don't feature flag, I'd just want to make sure that it doesn't significantly increase binary size, and that a user won't accidentally stumble upon on it during typical usage (help).

vrongmeal commented 4 months ago

I think a !release here makes more sense.

tychoish commented 4 months ago

I think a !release here makes more sense.

Why?

Is this just "because it seems like having a test runner in prod feels wrong?" or is there a technical reason that it makes more sense to you?

I'm aware of a few products that ship with a hidden test runner, and unless you have reason to know you don't know about it because it's... hidden, and I just want to see if there's something that I'm missing.

vrongmeal commented 4 months ago

My argument is more towards: Are there any advantages of including a test runner in the release build? And an instant answer is no for me.

If a user faces any issue, they are likely to provide an SQL script to reproduce the issue, SLTs are only useful to us.

For me, they would only result in increasing the binary size. The only thing I can think of in favour of including it is that we keep the release and debug binary as close as possible (which IDK if in this case is a very strong argument).

Hiding it/ Not including it is quite similar (except for the binary size).

universalmind303 commented 4 months ago

I agree with @vrongmeal's sentiment here. It's a pretty narrowly scoped feature, so I'm not so concerned about configuration drift between dev/prod. It seems like if we are already going to hide it for prod, why not just remove it entirely?

tychoish commented 4 months ago

If a user faces any issue, they are likely to provide an SQL script to reproduce the issue, SLTs are only useful to us.

I'm thinking less of "users writing slts" and more of the case where we write an slt to validate the behavior that we expect on someone else's system, we can get a clear and controlled response, as well as more coverage than running a sql script would provide us.

It also seems that, because we don't really test on many platforms, being able to rerun (most?) of our test suite to recertify that we do in fact work as expected on an arbitrary platform, is a good property to have.

The only thing I can think of in favour of including it is that we keep the release and debug binary as close as possible (which IDK if in this case is a very strong argument).

To be honest, this is pretty compelling to me, and the main thing that I'm worried about, distributing a binary that we aren't actually testing, or using locally ourselves, seems like it incurs a pretty non trivial risk (different binary, different code layout, different compilation process that we don't test,)

they would only result in increasing the binary size.

7mb or 5% (prev 120 on my machine, for release builds), the debug builds are 2.3gb and the impact of this change is a rounding error there.

I'm not worried about this (though looking through their code and ours it's sort of hard to imagine what accounts for this: the only thing I can think of really is the fact that we re-export tonic and a bunch of arrow stuff for flight in rpcserv, which I think isn't needed anymore, but I might be wrong.)

Because this only impacts the glaredb binary (which already includes everything), which is not the primary avenue into the product (language bindings and libpq/psql/flightsql are all more prevalent), I don't worry about that.

I think if we're actually worried about binary size, we can definitely work on that, later.