aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
432 stars 198 forks source link

[Question] How to get available bins for a key? #282

Closed carter214 closed 4 years ago

carter214 commented 4 years ago

Hello,

I want to query Aerospike to get all bins for a key. The GetHeader does not include the bin names and I cannot find another method that will provide this.

Here is the reason why I want this information. I am trying to prototype an implementation of Uber's Schemaless design in Aerospike.

Each bin is an ordered map with the key as a timestamp and the value as a blob of data. In my case each micro service will have its own bin. Due to the dynamic nature of our system, we do not know the list of bins a head of time (new services are added all the time) and which bins could apply to a record.

For namespace test in set computer and key system1 we have the following data.

bin T10 T15 T20 T30
cpu a b c
memory d
ssd e

I want to query to find what system1 looked like at any time T or the latest view (time at infinity). If a value is not defined at a specific T then the previous value should be returned.

If I know the bin names then I can use MapGetByKeyRelativeIndexRangeCountOp to get the values.

Example code

package main

import (
    "flag"
    "fmt"
    "log"

    as "github.com/aerospike/aerospike-client-go"
    // "github.com/thanhpk/randstr"
)

var client *as.Client

func main() {

    var err error

    var ip = flag.String("aerospike-ip", "localhost", "Aerospike IP to connect to")
    flag.Parse()

    cPolicy := as.NewClientPolicy()
    cPolicy.LimitConnectionsToQueueSize = false
    client, err = as.NewClientWithPolicy(cPolicy, *ip, 3000)
    if err != nil {
        log.Println(err.Error())
    }
    defer client.Close()
    writePolicy := as.NewWritePolicy(0, as.TTLDontExpire)
    mapPolicy := as.NewMapPolicy(as.MapOrder.KEY_ORDERED, as.MapWriteMode.CREATE_ONLY)

    key, _ := as.NewKey("test", "computer", "system1")
    client.Operate(writePolicy, key, as.MapPutOp(mapPolicy, "cpu", 10, "a"))
    client.Operate(writePolicy, key, as.MapPutOp(mapPolicy, "cpu", 20, "b"))
    client.Operate(writePolicy, key, as.MapPutOp(mapPolicy, "cpu", 30, "c"))
    client.Operate(writePolicy, key, as.MapPutOp(mapPolicy, "memory", 20, "d"))
    client.Operate(writePolicy, key, as.MapPutOp(mapPolicy, "sdd", 15, "e"))

    responseMap, err := client.Operate(writePolicy, key, as.MapGetByRankOp("cpu", -1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read Rank -1 %+v\n", responseMap.Bins)

    responseMap, err = client.Operate(writePolicy, key, as.MapGetByKeyRelativeIndexRangeCountOp("cpu", 2, -1, 1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read T2 %+v\n", responseMap.Bins)

    responseMap, err = client.Operate(writePolicy, key, as.MapGetByKeyRelativeIndexRangeCountOp("cpu", 12, -1, 1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read T12 %+v\n", responseMap.Bins)

    responseMap, err = client.Operate(writePolicy, key, as.MapGetByKeyRelativeIndexRangeCountOp("cpu", 23, -1, 1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read T23 %+v\n", responseMap.Bins)

    responseMap, err = client.Operate(writePolicy, key, as.MapGetByKeyRelativeIndexRangeCountOp("cpu", 35, -1, 1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read T35 %+v\n", responseMap.Bins)

    responseMap, err = client.Operate(writePolicy, key, as.MapGetByKeyRelativeIndexRangeCountOp("cpu", 20+1, -1, 1, as.MapReturnType.KEY_VALUE))
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read T20+1 %+v\n", responseMap.Bins)

    // I do not want to do this because it will return all data for a key. 
        // This will be a very large amount of data when there are many keys (timestamps) and each values is 2kb.
    records, err := client.Get(nil, key)
    if err != nil {
        log.Println(err.Error())
    }
    fmt.Printf("Read Get %+v\n", records.Bins)
}

Any help is appreciated.

khaf commented 4 years ago

From what I understand, you are trying to implement a sort of time-series datastore via Aerospike.

You should keep in mind that the size of a record in Aerospike is limited (consult our documentation on the website for the current limit), and the record cannot grow over that size.

At the moment there is no API that provides the bin list. I can see two solutions:

  1. Wrap the Client object, and provide your own API around it. In the wrapped Put method, always keep a meta field like __binNames in which you keep the list of bin names in the record.
  2. Implement a UDF which returns the bin names. I prefer not to expand on this solution because it well be slower than the first solution.

Let me know if I can help you any further.

carter214 commented 4 years ago

Thank you for your response. Solution 1 would work to solve my problem. It does add complexity to every client, not all of our micro services will be written in Go.

Could I write a UDF to iterate over all bins in a record and run the MapGetByKeyRelativeIndexRangeCountOp on each of them? This would provide the data I actually need.

The bigger concern is the record size as you mentioned. With the record limit that makes Aerospike a non-starter for this project.

Thanks again. I will keep Aerospike in mind for other projects.

khaf commented 4 years ago

I'll go ahead and close this ticket. Don't hesitate to open a new one in case you had other questions.