Closed lpil closed 3 months ago
My guess was that it is sacrificing some write performance to reduce database size.
Pretty much, yes. An int is 4 bytes. Many strings are longer than 4 bytes, and there are potentially millions or even billions of rows, so it adds up.
Saving about 50 bytes per row adds up to about 47G for every million rows on the hits table alone. This is then also carried over to some of the other tables and some indexes, so in total it's much more.
A hits row is 68 bytes in a quick check (could be off a few bytes), so 50 bytes per row is a fairly significant percentage.
For small to medium instances this doesn't really matter, because the difference between 100M or 200M doesn't really matter, and even 5G or 10G isn't that important in most cases. But for goatcounter.com it does matter because it's 500G vs. 1T. This also affects query performance, because it has to read less from disk, can cache more in memory, etc.
Write performance has never really been much of an issue. I don't really see why writes should be slower? Less data to write should be faster, no?
I see! Thank you!
Write performance has never really been much of an issue. I don't really see why writes should be slower? Less data to write should be faster, no?
You have to do a get-or-insert on the supplementary tables before you can record a hit, right? So you'd be sending the same amount of data but performing a greater number of queries.
You have to do a get-or-insert on the supplementary tables before you can record a hit, right?
Yeah, but it caches caches a lot in memory:
browsers 189 items 19K
changed_titles 15255 items 3076K
loc 1795 items 344K
paths 35750 items 5678K
sites 3452 items 2559K
sites_host 3190 items 292K
systems 93 items 9K
ua 2496 items 488K
So about 12M. For a ~450G database, that's not too bad.
Ah! Do you have an in-memory dictionary of recently used paths, etc? Or is that the database's cache you're talking about?
Just my own cache. Just a simple map with time-based eviction. Simple, but works.
e.g.: https://github.com/arp242/goatcounter/blob/master/user_agent.go#L64-L70
Pattern is similar for most of it.
Lovely. Thank you for the help!
Hello there! Thanks for this lovely bit of software!
I was reading the database schema and I was wondering why it is that hits references systems, browsers, and paths, rather than putting them in one table.
My guess was that it is sacrificing some write performance to reduce database size. Is that a good guess? If so, did you record any impacts on performance and size by doing this?
Thanks, Louis