Question : doing complex searches based on order

boltdb / bolt

An embedded key/value database for Go.

MIT License

14.16k stars 1.51k forks source link

Question : doing complex searches based on order #518

Closed joeblew99 closed 8 years ago

joeblew99 commented 8 years ago

Sorry if this is the wrong place to ask a usage question, but I am currently using rethinkdb but want to move my current code to boltdb.

How do I do think like: Select * from users where username = x sortby ascending ? It seems like the only way is to fireach using an iterator in memory, and then keep a counter ? But even then not sure.

The other thing is how to model a child \ parent relationship. Say a product catalogue of DRI ks with various types of drinks,

DavidVorick commented 8 years ago

Make sure you are aware that boltdb is a key-value database, and not a relational database. A lot of operations like the ones above will not be possible with boltdb. As it were, the one you mentioned is actually possible, as long as you are using the username as the key. From the README:

Prefix scans

To iterate over a key prefix, you can combine Seek() and bytes.HasPrefix():

db.View(func(tx *bolt.Tx) error {
    // Assume bucket exists and has keys
    c := tx.Bucket([]byte("MyBucket")).Cursor()

    prefix := []byte("1234")
    for k, v := c.Seek(prefix); bytes.HasPrefix(k, prefix); k, v = c.Next() {
        fmt.Printf("key=%s, value=%s\n", k, v)
    }

    return nil
})

joeblew99 commented 8 years ago

Thanks makes it easy.

I guess for joins / cardinality I should check the docs. If anyone can point me to example code on github that would be super

On Mon, 22 Feb 2016, 15:43 David Vorick notifications@github.com wrote:

Make sure you are aware that boltdb is a key-value database, and not a relational database. A lot of operations like the ones above will not be possible with boltdb. As it were, the one you mentioned is actually possible, as long as you are using the username as the key. From the README: Prefix scans

To iterate over a key prefix, you can combine Seek() and bytes.HasPrefix() :

db.View(func(tx *bolt.Tx) error { // Assume bucket exists and has keys c := tx.Bucket([]byte("MyBucket")).Cursor()
prefix := []byte("1234")
for k, v := c.Seek(prefix); bytes.HasPrefix(k, prefix); k, v = c.Next() {
    fmt.Printf("key=%s, value=%s\n", k, v)
}

return nil
})

Range scans

Another common use case is scanning over a range such as a time range. If you use a sortable time encoding such as RFC3339 then you can query a specific date range like this:

db.View(func(tx *bolt.Tx) error { // Assume our events bucket exists and has RFC3339 encoded time keys. c := tx.Bucket([]byte("Events")).Cursor()
// Our time range spans the 90's decade.
min := []byte("1990-01-01T00:00:00Z")
max := []byte("2000-01-01T00:00:00Z")

// Iterate over the 90's.
for k, v := c.Seek(min); k != nil && bytes.Compare(k, max) <= 0; k, v = c.Next() {
    fmt.Printf("%s: %s\n", k, v)
}

return nil
})

— Reply to this email directly or view it on GitHub https://github.com/boltdb/bolt/issues/518#issuecomment-187211346.

DavidVorick commented 8 years ago

I'm not 100% confident, but I do not think that you will be able to do joins or cardinality with boltdb. Those are operations that can be done by a relational database. Boltdb is not a relational database.

benbjohnson commented 8 years ago

@joeblew99 As @DavidVorick mentioned, there's no concept of indexes in Bolt. However, you can build your own which do effectively the same thing.

The easiest way is to add a bucket that serves as the index. Let's say you have an Accounts bucket with a one-to-many relationship to a Users bucket. Each of these would be keyed on their primary key (Account ID and User ID, respectively). If you wanted to be able to look up Users by account you can add another bucket (called UserAccounts or Users.AccountID or whatever) and use nested buckets. The key would be the AccountID and then the key each nested bucket would be the UserID (with no value).

It ends up looking like this:

- Accounts (bucket)
  - 100: {Name:"ABC Corp"}
  - 200: {Name:"Widgets, Inc"}

- Users (bucket)
  - 1: {Name:"Susy", AccountID: 100}
  - 2: {Name:"John", AccountID: 100}
  - 3: {Name:"Abby", AccountID: 200}

- Users.AccountID (bucket)
  - 100 (bucket)
    - 1: 
    - 2: 
  - 200 (bucket)
    - 3:

Then if you want to retrieve users by account ID then you simply go to the User.AccountID bucket and then into the nested bucket for the given Account ID and you can read all the User IDs for that account.

To perform your search by username, you would need something similar. You would need a separate bucket to map username to user id:

- Accounts
  - 100: {Name:"ABC Corp"}
  - 200: {Name:"Widgets, Inc"}

- Users
  - 1: {Name:"Susy", AccountID: 100}
  - 2: {Name:"John", AccountID: 100}
  - 3: {Name:"Abby", AccountID: 200}

- Users.Name
  - "Abby": 3
  - "John": 2
  - "Susy": 1

However, if your data set is small it can be easier to simply brute force load the data and then perform filtering & sorting.

joeblew99 commented 8 years ago

That nails it. Thanks Ben.

Quite easy concept. Take the primitives of what SQL does and do the same in golang. If the compute and data is on the same box its still fast.

If you can point me to examples of code doing the sort , order , etc would be great. I understand if not any time. I had a look around github but did not have much luck :)

Now when you want to shard the boltdb physically ( for perf or throughput reasons ) the design pattern would be map reduce type of thing I assume ? If yes, it looks to me like:

code up map reducers on top of the primitive sort, group etc code.
physically you can either deploy the map reducers above OR put at the exact same level as the primitives and allow all servers ( with boltdb and golang code together ) to talk to each other. A simple load balancer then picks the leader of that request for the map reduce.

On Tue, 23 Feb 2016, 15:41 Ben Johnson notifications@github.com wrote:

@joeblew99 https://github.com/joeblew99 As @DavidVorick https://github.com/DavidVorick mentioned, there's no concept of indexes in Bolt. However, you can build your own which do effectively the same thing.

The easiest way is to add a bucket that serves as the index. Let's say you have an Accounts bucket with a one-to-many relationship to a Users bucket. Each of these would be keyed on their primary key (Account ID and User ID, respectively). If you wanted to be able to look up Users by account you can add another bucket (called UserAccounts or Users.AccountID or whatever) and use nested buckets. The key would be the AccountID and then the key each nested bucket would be the UserID (with no value).

It ends up looking like this:

Accounts (bucket)

100: {Name:"ABC Corp"}

200: {Name:"Widgets, Inc"}

Users (bucket)

1: {Name:"Susy", AccountID: 100}

2: {Name:"John", AccountID: 100}

3: {Name:"Abby", AccountID: 200}

Users.AccountID (bucket)

100 (bucket)

1:

2:

200 (bucket)

3:

Then if you want to retrieve users by account ID then you simply go to the User.AccountID bucket and then into the nested bucket for the given

Account ID and you can read all the User IDs for that account.

To perform your search by username, you would need something similar. You would need a separate bucket to map username to user id:

Accounts

100: {Name:"ABC Corp"}

200: {Name:"Widgets, Inc"}

Users

1: {Name:"Susy", AccountID: 100}

2: {Name:"John", AccountID: 100}

3: {Name:"Abby", AccountID: 200}

Users.Name

"Abby": 3

"John": 2

"Susy": 1

However, if your data set is small it can be easier to simply brute force load the data and then perform filtering & sorting.

— Reply to this email directly or view it on GitHub https://github.com/boltdb/bolt/issues/518#issuecomment-187723608.

joeblew99 commented 8 years ago

For inspiration of the map reduce in golang: https://blog.gopheracademy.com/advent-2015/glow-map-reduce-for-golang/

benbjohnson commented 8 years ago

@joeblew99 Yeah, essentially Go becomes your query language. :)

If you are brute forcing the sorting by loading the objects into memory and then sorting then you can simply use the built-in sort package. There are some examples in the package docs.

As far as map/reduce, that's a much more complex topic (and outside of the scope of Bolt). Certain operations are map/reducible but some are not. There's quite a bit of documentation on the subject and it looks like that Glow package might be a good starting point.

yogihardi commented 7 years ago

Hi @joeblew99

Maybe it's too late I replied this thread, it's been more than a year

this lib (https://github.com/ahmetb/go-linq) aims we can do query like RDBMS in our slice, once you get data from BoltDB, you can play with it.