Open un000 opened 10 months ago
Do you know what causes the timeouts? Do you have an unstable cluster/network? I have a bit of trouble reproducing this issue. We have found a case in which in some default configurations, adding a new node to the cluster could exhaust the max retries, but I presume that's not what you are observing here.
@khaf the cluster is stable. It connected with 10GB local network and there no issues except long partition scans.
Can you also include your Statement
code? And the ExpressionFilter
?
@khaf sure
statement := aerospike.NewStatement(r.namespace, r.set)
statement.Filter = aerospike.NewEqualFilter("intbin", 5555)
And the ExpressionFilter
? How many records are there in the set
? Do you have an estimate of how many records are going to be returned? And is it an in-memory or flash namespace?
set:
disable-eviction: "false"
ns: "namespace"
index_populating: "false"
objects: "36306534"
stop-writes-count: "0"
set: "setname"
enable-index: "false"
sindexes: "2"
memory_data_bytes: "33855994789"
device_data_bytes: "31576275424"
truncate_lut: "0"
tombstones: "0"
Indexes:
*************************** 1. row ***************************
ns: "namespace"
bin: "stringbin"
indextype: "NONE"
set: "setname"
state: "RW"
indexname: "stringbin_idx"
path: "stringbin"
type: "STRING"
*************************** 2. row ***************************
ns: "namespace"
bin: "stringbin"
indextype: "NONE"
set: "setname"
state: "RW"
indexname: "intbin_idx"
path: "stringbin"
type: "NUMERIC"
3 nodes setup with a multicast
service {
cluster-name cluster
user aerospike
group aerospike
paxos-single-replica-limit 1
proto-fd-max 15000
migrate-threads 6
}
namespace namespace {
memory-size 110G
replication-factor 2
default-ttl 0
nsup-period 120
storage-engine device {
cold-start-empty true
file /var/aerospike/a.p1.db
file /var/aerospike/a.p2.db
file /var/aerospike/a.p3.db
file /var/aerospike/a.p4.db
filesize 64G
data-in-memory true
write-block-size 128K
}
migrate-sleep 0
defrag-sleep 0
}
Estimated ~6 mlns of records of ~60 mlns in the set
ExpressionFilter isn't set
Thanks for the the detailed info. I'm on it, may take a couple of days though.
Hi. Looks like I have the similar problem. 3 nodes cluster (aerospike-server:5.7.0.24) in k8s. Local network. 1+ billion records in set. In-memory storage.
clientPolicy := aerospike.NewClientPolicy()
clientPolicy.Timeout = 10 * time.Second
clientPolicy.IdleTimeout = 20 * time.Second
sp := aerospike.NewScanPolicy()
sp.RecordQueueSize = 5000
sp.IncludeBinData = false
sp.MaxRetries = 10
sp.SleepBetweenRetries = time.Second
recordset, err := aeroClient.ScanAll()
Reading results in 10 threads. After processing 560 mlns records got error:
ResultCode: NETWORK_ERROR, Iteration: 0, InDoubt: false, Node: A0 10.206.195.132:3000: network error.
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/connection.go:96 github.com/aerospike/aerospike-client-go/v6.errToAerospikeErr()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/connection.go:262 github.com/aerospike/aerospike-client-go/v6.(*Connection).Read()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/buffered_connection.go:92 github.com/aerospike/aerospike-client-go/v6.(*bufferedConn).readConn()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/buffered_connection.go:106 github.com/aerospike/aerospike-client-go/v6.(*bufferedConn).read()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:250 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).readBytes()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:202 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseKey()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:292 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseRecordResults()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:174 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).parseResult()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/command.go:2745 github.com/aerospike/aerospike-client-go/v6.(*baseCommand).executeAt()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/command.go:2570 github.com/aerospike/aerospike-client-go/v6.(*baseCommand).execute()
/go/pkg/mod/github.com/aerospike/aerospike-client-go/v6@v6.14.0/multi_command.go:415 github.com/aerospike/aerospike-client-go/v6.(*baseMultiCommand).execute()
I executed app many times, it's scanning normal until 560 mlns and then always breaks at the same point. So full scan never finished. Cluster is stable, all nodes are alive. Try doing scan at different time when cluster is not under high load.
+1 Same problem for us
We have encountered the same issue
I don't understand how to manage timeouts for Query. I got 20 retries with a sleep of 2 seconds and still issuing timeouts.
Per record processing time 150-250ms with 300 goroutines. What should I change to increase timeout from aerospike, because after 20 retries ~ after 60-70 seconds of working the code fails?
AS: Aerospike Community Edition build 5.6.0.5 Client: v6.13.0