Closed herr-seppia closed 2 years ago
You're hinting at the fact that the wallet is not the cause, could you hint at what you do think is the cause? From the information I have here it looks like the prove_execute
RPC (we should really change the text there) tries to load some memory - perhaps a prover key - that is too much for the environment.
The largest prover keys are around 0.52GB, meaning loading one of them would consume around 50% of the memory available in the given VPS. Maybe the "solution" here is to just give it more memory? What do you think @herr-seppia?
The problems seems not related to the prover key, the prover key is the cause of the OOM, but not the cause of the leak
Just launched a cluster (always with wip-vm11) with no rate limit at all, and the memory is continually increasing even with no txs sent
Seems to me that the cause is related to microkelvin
I launched a cluster with 10 nodes, and I start to think it's something related to grpc
Node0 is the BlockGenerator (so dusk-blockchain call EST multiple times, as it's the only node in the consensus) Node1-7 are passive nodes (they only receive accepted blocks and call no EST at all) Node8 have no dusk-blockchain connected (no GRPC is served at all, it acts just as a network router) Node9 is contacted by the Explorer and the Wallet (same as nodes 1-7, but with additional clients)
wip20220224-node-0
%MEM
9.9
-----
wip20220224-node-1
%MEM
7.4
-----
wip20220224-node-2
%MEM
7.4
-----
wip20220224-node-3
%MEM
7.4
-----
wip20220224-node-4
%MEM
7.7
-----
wip20220224-node-5
%MEM
7.3
-----
wip20220224-node-6
%MEM
7.4
-----
wip20220224-node-7
%MEM
7.4
-----
wip20220224-node-8
%MEM
2.5
-----
wip20220224-node-9
%MEM
36.0
-----
After restarted and resync the node-9 the amount of memory is the following
wip20220224-node-9
%MEM
8.0
As per mem profiler, a leakage was detected from STATIC_MAP
From github.com-1ecc6299db9ec823/canonical-0.6.6/src/id.rs
impl Id {
/// Creates a new Id from a type
pub fn new<T>(t: &T) -> Self
where
T: Canon,
{
let len = t.encoded_len();
let payload = if len > PAYLOAD_BYTES {
Store::put(&t.encode_to_vec())
} else {
...
};
Additionally, for Store::put(&t.encode_to_vec()) we've got
pub(crate) fn put(bytes: &[u8]) -> IdHash {
// If length is less than that of a hash, this should have been inlined.
debug_assert!(bytes.len() > core::mem::size_of::<IdHash>());
let hash = Self::hash(bytes);
STATIC_MAP.write().insert(hash, Vec::from(bytes));
hash
}
NB: It's only take_bytes
that removes item from STATIC_MAP
.
If pub(crate) fn take_bytes(id: &Id) -> Result<Vec<u8>, CanonError>
is not called for all items, we can experience OOM due to constantly increasing STATIC_MAP
size.
The issue is allegedly fixed in canonical 0.7.0
See dusk-network/rusk#609 See dusk-network/rusk#606
Memory keep increasing, slowly but steady
wip-20220228-node-0
%MEM RSS
13.9 138536
-----
wip-20220228-node-1
%MEM RSS
7.5 75156
-----
wip-20220228-node-2
%MEM RSS
7.6 76548
-----
wip-20220228-node-4
%MEM RSS
7.5 74704
-----
wip-20220228-node-5
%MEM RSS
7.5 75156
-----
wip-20220228-node-6
%MEM RSS
7.3 73288
-----
wip-20220228-node-7
%MEM RSS
7.5 74648
-----
wip-20220228-node-8
%MEM RSS
6.3 63656
-----
wip-20220228-node-9
%MEM RSS
26.0 258908
Upon running a cluster with no rate-limit at all, with 1GB of memory per VPS, after an initial memory increase, at block height ~15.000, it then stabilizes at a constant ~350MB (at the time of writing this, the cluster is at 70,000 blocks).
Closing in favor of dusk-network/dusk-blockchain#1317
Describe the bug The memory usage during the transaction creation grows depending on the block height. This results in a OOM (initially thought to be solely at high block_height, but later discovered to be present even at lower block height).
To Reproduce Run a local cluster with no rate limiter and the broadcast txs
Logs/Screenshot
Platform Running both rusk and dusk-blockchain with branch
wip-vm11
Additional context Block height was at 26761 Rusk has been killed by the OOMKiller on a vps with a 1GB of memory. Calling getBalance multiple time at block height doesn't kill the process (the memory grows and come back at the initial level after the operation has been performed)
Update from "OOM when sending tx at high block height" [TBD]