Queries caching - Githubissues

kderme commented 3 years ago

I have a query like this

queryBValue :: MonadIO m => (ByteString, Word16) -> ReaderT SqlBackend m _
queryBValue (hash, index) = do
    select . from $ \ (a `InnerJoin` b) -> do
        on (a ^. AId ==. b ^. BId)
        where_ (b ^. BIndex ==. val index
                &&. a ^. AHash ==. val hash
                )
        pure (b ^. BId, b ^. BValue)

whichs runs many times (up to 2 millions in a relatively small time window) with different input.

Profiling shows that toRawSql takes a considerable amount of time. Combined with builderToText they seem to cost more than the actual execution of the query in rawQueryRes.

Is it possible to somehow cache the sql query text and reuse it with different [PersistValue]? Is this caching possible in esqueleto and if I want to do the caching on the user end, what functions should I use?

parsonsmatt commented 3 years ago

Huh, that's intriguing.

esqueleto probably can't reasonably cache queries. But, you can definitely do this on your end with renderQuerySelect. That emits a (Text, [PersistValue]) that should work with your database connection. Then substitute the [PersistValue] for whatever you need, and call rawSql from persistent to get the results out. This won't work with Entitys though - you'd need to dig into the implemnetation of rawSql and remove the ?? placeholder parsing.

So,

newtype QueryByValue m = QueryByValue
  { unQueryByValue :: ByteString -> Word16 -> SqlPersistT m _
  }

mkQueryByValue
  :: SqlPersistT m (QueryByValue m)
mkQueryByValue = do
  (txt, _) <- renderQuerySelect $ queryBValue ("asdf", 4)
  pure $ QueryByValue $ \bs w16 -> do
    rawQuery txt [toPersistValue bs, toPersistValue w16]

runQueryByValue :: ByteString -> Word16 -> QueryByValue m -> SqlPersistT m _
runQueryByValue bs w16 qbv = 
  unQueryByValue qbv bs w16

for an ad-hoc way to do it.

belevy commented 3 years ago

As Matt said we can't really handle this on the esqueleto side since SqlQuery is a monad and therefore supports when which means that arbitrary values can cause different queries to run.

Something you might also want to consider is using a prepared statement if your backend supports it. It is interesting to see the actual query generation being a cost center, that is something I would have expected should never need to be optimized. Do you have any more insights about which section of toRawSql is slow? Does using the new from syntax change any of the performance characteristics?

EDIT: having looked at the code I now really want to see the performance of the new syntax. collectOnClauses is likely causing issues here. With the new syntax we do not have to parse the on clauses to determine which join type they match with.

belevy commented 3 years ago

Tried coming up with a benchmark to see what the difference between the two is. I ran with 1, 2, and 3 joins using the old and new from syntax (no where clause) in an attempt to see performance difference.

@kderme Are you using only a single join in your actual query? The new syntax is always faster but it also has linear growth in number of joins where as the old style is quadratic.

kderme commented 3 years ago

I think I misinterpreted the results of profiling and the actual cost is lower than what I described. Probably I was getting an extreme case of https://github.com/mpickering/hs-speedscope/issues/6. So around 15% of time was spent in toRawSql, but this doesn't account for the total ecexution time as I had assumed.

Do you have any more insights about which section of toRawSql is slow?

mostly makeFrom, which I think is the part where the Join is handled. I have the flame graph for more details, hopefully somewhat readable. flame-graph

Are you using only a single join in your actual query?

Yes it's a single join.

having looked at the code I now really want to see the performance of the new syntax.

We'e using the Legacy api, but plan to change to the new one. Is this the new syntax you're referring to?

In any case, thank you for your input. Feel free to close.

belevy commented 3 years ago

We'e using the Legacy api, but plan to change to the new one. Is this the new syntax you're referring to?

Yes, both can live in the same query (just need to namespace the from and on functions) and the entire collectOnClauses portion will go away. As you can see from my very unscientific benchmark the new syntax is about 100% faster in the single join case and only grows linearly in the number of joins.

belevy commented 3 years ago

Closing since there is nothing to change. Performance of query generation appears to be on the order of 10 microseconds.

bitemyapp / esqueleto

Queries caching #293