Closed mwotton closed 7 years ago
Sure, when it's out of alpha. Would be interesting to run my benchmarks with and without the new version.
Anyone working on this upgrade? We'd be happy to jump in if not!
Just as a note, #47 describes the procedure to upgrade the bundled sqlite (we have a "useful" issue label just to make that findable easily). We've had to do this a bunch of times now, I'm not sure if anyone has talked to upstream about whether they know about the issue and what they plan to do about it? Volunteer efforts could productively be spent on that communication.
May I plead for another sqlite3 upgrade (or see below for an alternative approach)?
I have a schema that describes objects in DNS that can have CNAME chains. I am storing data about domains, the MX hosts of domains, and the IP and TLSA records of MX hosts. This introduces opportunities for nested CNAME chains:
a.example. IN CNAME b.example.
b.example. IN CNAME c.example.
c.example IN MX 0 d.example.
d.example. IN CNAME e.example.
e.example. IN CNAME f.example.
f.example. IN A 192.0.2.1
_25._tcp.f.example. IN CNAME _dane.g.example.
_dane.g.example. IN CNAME. _dane.h.example.
h.example. IN TLSA 3 1 1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
To process such data, I need to construct a temporary table that represents the transitive closure of the all the CNAME aliases, that stores for each alias its final target.
CREATE TABLE IF NOT EXISTS "domain"
( "qname" VARCHAR(255)
, "rtype" INTEGER
, "cname" VARCHAR(255)
, "dnssec" INTEGER
, PRIMARY KEY ("qname", "rtype")
) WITHOUT ROWID;
CREATE TEMP TABLE "canonical"
( "alias" VARCHAR(255)
, "rtype" INTEGER
, "owner" VARCHAR(255)
, PRIMARY KEY ("alias", "rtype", "owner")
) WITHOUT ROWID;
WITH RECURSIVE
"dealias"(a,r,o) AS (
SELECT D."qname" as a, D."rtype" as r, D."qname" as o
FROM "domain" D
WHERE D."cname" is NULL
UNION
SELECT C."qname" as a,
T.r as r,
T.o as o
FROM "dealias" T, "domain" C
WHERE C."cname" = T.a
AND C."rtype" = T.r
)
INSERT INTO "canonical" ("alias", "rtype", "owner")
SELECT T.a, T.r, T.o
FROM "dealias" T
Using the sqlite3
command-line, with my dataset of ~100k domains, this query takes 0.2 seconds on my laptop. However, with the SQLite bundled with Direct-Sqlite, it takes ~16 minutes!
After checking that the problem is not in my code, I ended up building a custom copy of direct-sqlite that uses the system (really homebrew) sqlite libraries, and the query became (unsurprisingly) as fast as in the sqlite3
CLI.
If constant requests for upgrades are a pain, there is perhaps a better way:
I know that upstream SQLite recommends that applications bundle their own copy of SQLite so that they get a stable platform, but I am not convinced that this makes sense for libraries. It seems to make more sense for the choice of SQLite to come from the top level application. So that if I build something that depends on Database.SQLite.Simple -> Database.SQLite3.Direct -> SQLite, I can choose to bundle my own Database.SQLite3.Direct and a suitable SQLite version, or otherwise just go with the system version.
So a better long-term strategy, rather than constant upgrades of the bundled SQLite in Database.SQLite3.Direct, may be to change the cabal file to have "systemlib" on by default, and allow applications with special requirements to specify a frozen version of their choice. This is easy with stack
, right now I have:
extra-deps:
- direct-sqlite-2.3.17
flags:
direct-sqlite:
systemlib: true
But I think that should be the default, and one would then use some explicit libsqlite3.a for applicationst that want that sort of stability, which versions is right, rather depends on the application!
I think this will considerably reduce the wear-and-tear on the package, by leaving the choice of the SQLite library to the application.
Thoughts?
@vdukhovni & others, Sorry for being so passive as a maintainer for such a long time. A bunch of real life things changed, so I had a long hiatus from Haskell programming. But I'm getting back into it now and I think I should be able to support this library better too.
Regarding systemlib vs embedding the native library as source. I don't have a strong preference either way. If I can keep up with more frequent releases (and maybe get a backup maintainer), then maybe the default is fine? My concern with systemlib default is how easy it will be to use this library on Windows. With the cbits embedded, it's easy to build without any configuration. Using any type of OSS shared libraries on Windows can be a PITA to get building.
@vdukhovni @mwotton is sqlite 3.15.0 what we should upgrade to?
@mwotton @mkscrg @vdukhovni @IreneKnapp Upgraded sqlite3 to 3.15.0. The PR is up here:
https://github.com/IreneKnapp/direct-sqlite/compare/sqlite-upgrade-3.15.0?expand=1
I'll let that soak in for a day and then prepare a merge & hackage release.
While at it, I also fixed the test suite which was failing on GHC 8.0.
3.15.0 will do I guess... The idea of using the system libraries by default (can the default be platform dependent?) may still merit further thought/discussion. I am not saying that I'm sure that's the right answer, but I do think it is worth considering.
Oh, and thanks for the upcoming update!
I just released direct-sqlite-2.3.18 on hackage.
The idea of using the system libraries by default (can the default be platform dependent?)
@vdukhovni Sure, it makes sense to discuss the pros and cons, of course.
The pros I see for the current "cbits is the default" are:
IMO the current default is a more reliable default than relying on what the system happens to have installed. At the risk of using made up statistics, I would imagine most users of this library are content with what's been packaged into cbits and can live with a version that's a even a few releases old (and probably don't even care what sqlite3 is embedded, as long as the package itself works). OTOH, those who need a specific version or want to override the systemlib variable, are probably more expert users and can deal with any potential problems that might arise.
Btw, an alternative (not mutually exclusive) to systemlib would be to provide a way for direct-sqlite to be built against user provided cbits. So if you have a project where you want a specific version of sqlite, you could git clone the C source into your own project and set things up with stack and cabal such that direct-sqlite builds with sqlite coming in from your local, out-of-package location. Not sure if that's easy to do with cabal, but listing this option here for completeness.
FWIW, SQLIte 3.15.2 is out: https://www.sqlite.org/changes.html and fixes a number of bugs, so at some point another upgrade may be in order...
@vdukhovni thanks! Will be updating shortly. Most likely tonight.
@vdukhovni Upgraded to sqlite 3.15.2 in direct-sqlite 2.3.19. I just released it to hackage.
Super! Thanks, much appreciated!
http://permalink.gmane.org/gmane.comp.db.sqlite.general/90549
exciting we should probably upgrade:)