fsprojects / FSharp.Azure.Storage

F# API for using Microsoft Azure Table Storage service
MIT License
75 stars 16 forks source link

Building a query with lists / seqs is ....hard. #40

Closed aggieben closed 4 years ago

aggieben commented 4 years ago

Description

The limitation of the query module to only the few boolean operators is very restrictive. I'd like to be able to query something with an expression like this:

<@ fun _ sysprops -> List.contains sysprops.PartitionKey listOfValues @>

My first thought was to try to build the expression with a function with the type 'a seq -> Expr<T -> Metadata -> bool> that would create expressions like sysprops.PartitionKey = value1 | sysprops.PartitionKey = value2 | ....

One might at this point suggest that I should choose a different partitioning strategy to make this easier, but it really doesn't fix the underlying issue: I want to query for a range of rows; which property is involved in the comparison isn't really the issue.

I would think this is very common., but trying to build an parameterized Expr<T -> Metadata -> bool> quite a chore (and impossible to do with splices, I think). Is there a recommended way to deal with queries like these?

daniel-chambers commented 4 years ago

Table storage itself is extremely restrictive in terms of the queries it allows you to write. "Contains" is not one of the operators that it allows, and transforming a Contains into a series of ORs is anti-pattern of how one should be using table storage, as it'll result in scans that don't use the indexes on the tables. So I'd say doing it is hard... on purpose, because one probably shouldn't be doing it. If one's requirements dictate flexible querying, table storage is the wrong tool for the job.

Check out this documentation from Microsoft about appropriate query patterns. In particular, they mention ORing RowKeys as an anti-pattern that results in an index scan, and I imagine doing it on a partition key would be even worse. :(

aggieben commented 4 years ago

@daniel-chambers thanks for the response. I think I eventually came to the same conclusion and figured out that it would be easier to just do a small number of point queries in parallel instead, rather than the OR expansion I first attempted.

daniel-chambers commented 4 years ago

Nice one @aggieben, sounds like you've landed on a good solution. 😃