aerospike / aerospike-client-go

Aerospike Client Go
Apache License 2.0
429 stars 199 forks source link

"read-write scan" with "Update using write multi-ops; that is, operations on bins." feature #342

Closed Boklazhenko closed 3 years ago

Boklazhenko commented 3 years ago

Hi. as i read from documentation (https://www.aerospike.com/docs/guide/scan.html) aerospike can read-write scan and modify records by multi ops.

can you help me with example please

i need something opGet and opDelete so that the same record is not scanned twice

realmgic commented 3 years ago

I don't know if you can read the record in a background scan - I hope maybe @khaf can shed some light about that - as this is running in the background, I'm not sure where we would get the read information. What you can do is modify the records - for example, here - adding a new bin to all the records in the set (or, commented out, touching all records, deleting all records):

/*
 * Copyright 2014-2021 Aerospike, Inc.
 *
 * Portions may be licensed to Aerospike, Inc. under one or more contributor
 * license agreements.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you may not
 * use this file except in compliance with the License. You may obtain a copy of
 * the License at http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 * License for the specific language governing permissions and limitations under
 * the License.
 */

package main

import (
    "log"
    "time"

    as "github.com/aerospike/aerospike-client-go"
    shared "github.com/aerospike/aerospike-client-go/examples/shared"
)

func main() {
    runExample(shared.Client)

    log.Println("Example finished successfully.")
}

func runExample(client *as.Client) {
    log.Println("Background Scan, add bin: namespace=", *shared.Namespace, " set=", *shared.Set)

    // Limit scan to recordsPerSecond.  This will take more time, but it will reduce
    // the load on the server.
    policy := as.NewScanPolicy()
    policy.RecordsPerSecond = 5000

    begin := time.Now()

    queryPolicy := as.NewQueryPolicy()
    writePolicy := as.NewWritePolicy(0, as.TTLDontExpire)

    stm := as.NewStatement(*shared.Namespace, *shared.Set)

    binNewBin := as.NewBin("newBin", 42)
    tsk, err := client.QueryExecute(queryPolicy, writePolicy, stm, as.PutOp(binNewBin))

    // tsk, err := client.QueryExecute(queryPolicy, writePolicy, stm, as.TouchOp())

    // writePolicy.DurableDelete = true
    // tsk, err := client.QueryExecute(queryPolicy, writePolicy, stm, as.DeleteOp())
    shared.PanicOnError(err)

    for err := range tsk.OnComplete() {
        // deal with the error here
        if err != nil {
            shared.PanicOnError(err)
        }

    }

    end := time.Now()
    seconds := float64(end.Sub(begin)) / float64(time.Second)
    log.Println("Elapsed time: ", seconds, " seconds")
}
khaf commented 3 years ago

Background scans do not return data. If I understand correctly, you don't want to read the same data twice. To do that, if you are using the newer Aerospike servers, you can use Expressions, and filter based on last modified data. The latest Go client use partition scans, which can internally retry:

                spolicy := as.NewScanPolicy()
        spolicy.FilterExpression = as.ExpLess(as.ExpLastUpdate(), as.ExpIntVal(XXXXX)),
        rs, err := client.ScanAll(spolicy, namespace, set)

You can then delete those records if you want with the same filter using a background scan. The most correct way of doing it would be to mark them first via background scan, read the marked records, and then delete the marked records. But the above solution comes close.

Boklazhenko commented 3 years ago

thank you for the tips. seems it is to work