IreneKnapp / direct-sqlite

MIT License
35 stars 54 forks source link

Custom functions, aggregates, and collations #53

Closed redneb closed 9 years ago

redneb commented 10 years ago

This is a relatively large pull request. It introduces support for defining custom SQL functions, aggregates, and collations. I tried to break it into as many commits as possible to make it easier to examine.

nurpax commented 10 years ago

Neat!

Any plans on adding tests for new features?

nurpax commented 10 years ago

I didn't review in detail yet but the style looks good and consistent.

redneb commented 10 years ago

Here's an example of a custom SQL function:

repeatString :: FuncContext -> FuncArgs -> IO ()
repeatString ctx args = do
    n <- funcArgInt64 args 0
    s <- funcArgText args 1
    funcResultText ctx $ mconcat $ replicate (fromIntegral n) s

test db = do
    createFunction db "repeat" (Just 2) True repeatString
    execPrint db "select repeat(3,'abc')"
    -- prints: abcabcabc

Example of a custom aggregate function and a custom collation:

-- like sqlite's built-in group_concat, except that it reverses the order
rConcatStep :: FuncContext -> FuncArgs -> [(Text, Text)] -> IO [(Text, Text)]
rConcatStep _ args l = do
    s <- funcArgText args 0
    sep <-
        if funcArgCount args > 1
            then funcArgText args 1
            else return ","
    return ((s, sep) : l)

rConcatFinal :: FuncContext -> [(Text, Text)] -> IO ()
rConcatFinal ctx ((s0, _) : l) = -- ignore the separator of the 1st element
    funcResultText ctx (mconcat (s0 : l'))
  where
    l' = map (\(s, sep) -> sep <> s) l
rConcatFinal _  [] = return ()

-- order by length first, then by lexicographical order
cmpLen :: Text -> Text -> Ordering
cmpLen s1 s2 = compare (T.length s1) (T.length s2) <> compare s1 s2

test db = do
    createAggregate db "group_rconcat" (Just 1) mempty rConcatStep rConcatFinal
    createAggregate db "group_rconcat" (Just 2) mempty rConcatStep rConcatFinal
    createCollation db "len" cmpLen
    exec db "create table animal(n)"
    exec db "insert into animal(n) values ('dog'),('mouse'),('ox'),('cat')"
    execPrint db "select group_rconcat(n) from animal"
    -- prints: cat,ox,mouse,dog
    execPrint db "select * from animal order by n collate len"
    -- prints: ox\ncat\ndog\nmouse\n
redneb commented 10 years ago

Ok, I'll add some tests.

redneb commented 10 years ago

Done

nurpax commented 10 years ago

Hopefully you're not in a big hurry to get this merged, we might be migrating maintainership of this project to another person (see https://github.com/IreneKnapp/direct-sqlite/issues/54). It makes sense to get that out of the way before merging in new code.

IreneKnapp commented 10 years ago

@redneb - As the outgoing maintainer, I want to thank you for this work! It's a really important bit of functionality that very much takes direct-sqlite in a direction I'd like it to go, although I suppose that's likely to be @nurpax's decision henceforth. :) I do appreciate your breaking it into small commits, and it's a shame that I don't have the time to review it before I step down; I hope the timing isn't too disappointing to you. Especially as a first-time contributor, I hope you'll feel that we're treating you right. :)

redneb commented 10 years ago

No worries, I'm not in a hurry.

nurpax commented 10 years ago

Thanks for adding some tests. I think it'd also be worthwhile to add tests for function/aggregate destruction. Something like 1. create fn, 2. delete fn, 3. verify that sqlite doesn't see fn anymore. There's a lot of tricky code here, so I don't think I will be able to review the FFI stuff for correctness. I hope you know what you're doing. :)

BTW, what is the intended use of these features? Is it for convenience or performance? If you need to call into Haskell from native per each result column & row, it probably costs quite a bit. At least I found in sqlite-simple that non-unsafe FFI calls are very expensive. (I'm not trying to shoot this feature down, just curious.)

redneb commented 10 years ago

I added some more tests for function/aggegate/collation deletion as well as for making sure that we handle correctly any exceptions raised by the haskell side of the implementation of a custom function.

As for the intended use, well, you might simply need a function that is not offered by sqlite and cannot be implemented (easily) otherwise. For example, if you are doing statical analysis, you might want to use an aggregate that calculates the standard deviation. The performance of custom functions might not be on par with that of built-in functions, but I think it is acceptable for most cases.

The implementation is indeed a bit complicated. The complexity stems mostly from the use of FunPtrs for callbacks. In particular, we have to try hard to ensure that each FunPtr used in the implementation of custom functions will be properly disposed when not needed any more. For that we need to create some other callbacks that sqlite will call for the cleaning up. And we need a way to pass information to those other callbacks, so that they can deallocate the right pointers.

nurpax commented 9 years ago

This is now in direct-sqlite-2.3.14 on Hackage.