Closed dnck closed 5 years ago
This seems to be related to https://github.com/facebook/rocksdb/issues/164 and https://github.com/facebook/rocksdb/issues/4089
This seems to be related to facebook/rocksdb#164 and facebook/rocksdb#4089
Stopping the node, and rebooting computer resolves the error. Still not sure about the cause though.
Hm, may I see the script for the "value-less"-spammer, because it seems that value transactions are being spammed, based on some of the exceptions. It could be the case that it spends from the same seed, resulting in a double spend and inconsistent tips (the fix then would be using a different seed for each spammer).
In the case of spamming value transfers from the same seed, the consistent tips get referenced by inconsistent value transfers, which means that the old tips are removed from the tips list and there are no new consistent tips, which newly arriving txs can reference, probably resulting in the mentioned errors. In a production environment, nodes will be constantly receiving transactions, that will be consistent, thus resulting in tips that may be referenced. If we are spamming value transactions from the same seed on 10 machines, that is not a realistic "spam" test case, as no user would double spend purposely, unless issuing an attack.
What could be done to fix it now (if it was related to double spends), is that a consistent tip is not removed from the tips list, until it is referenced by a consistent transaction (this might create a lot of side issues though). But, as already mentioned, in production, the node will be receiving "normal" txs, so that the node would not halt, as there are always new consistent tips.
Odd thing is, that when I spammed value-less, even under way larger load (50 spam instances with 100ms rate), I never ran into this issue.
Check this one and references therein: https://stackoverflow.com/questions/45510290/rocksdb-too-many-sst-files-of-very-small-size I believe the problem is that rocksDBs opens up too many files, it can be fixed by fine-tuning the config
Dan, could you share the configuration of the two nodes?
In the case of spamming value transfers from the same seed, the consistent tips get referenced by inconsistent value transfers, which means that the old tips are removed from the tips list and there are no new consistent tips, which newly arriving txs can reference, probably resulting in the mentioned errors. In a production environment, nodes will be constantly receiving transactions, that will be consistent, thus resulting in tips that may be referenced. If we are spamming value transactions from the same seed on 10 machines, that is not a realistic "spam" test case, as no user would double spend purposely, unless issuing an attack.
What could be done to fix it now (if it was related to double spends), is that a consistent tip is not removed from the tips list, until it is referenced by a consistent transaction (this might create a lot of side issues though). But, as already mentioned, in production, the node will be receiving "normal" txs, so that the node would not halt, as there are always new consistent tips.
Afterall, might rather be related to the issue, you/we previously had. Are the nodes you are running, in the same directory with different db path configs??
Hm, may I see the script for the "value-less"-spammer, valueless_spammer.js
const HELIX = require("@helixnetwork/core"); const CONVERTER = require("@helixnetwork/converter"); const HOST = "http://192.168.2.98:16000"; const SND = "0000000000000000000000000000000000000000000000000000000000000000"; const RCV = "0000000000000000000000000000000000000000000000000000000000000000"; const TOKENS = 0; const PERIODICITY = 1000; //send msg every PERIODICITY ms const DEPTH = 3; const MINWEIGHTMAGNITUDE = 2; var COMPOSER = HELIX.composeAPI({ provider: HOST }); var stored_tx_bytes; var transfer_object = { address: RCV, value: TOKENS, message: CONVERTER.asciiToHBytes("Hello"), tag: CONVERTER.asciiToHBytes("world!") }; send_tx = function(error, response){ COMPOSER .prepareTransfers(SND, [transfer_object]) .then(function(tx_bytes) { stored_tx_bytes = tx_bytes; return COMPOSER.sendHBytes(stored_tx_bytes, DEPTH, MINWEIGHTMAGNITUDE); }) .then(results => console.log(JSON.stringify(results))) .catch(err => {console.log(err);}); }; setInterval(send_tx, PERIODICITY);
Follower config.ini
[SBX]
API_HOST = 192.168.2.98
PORT = 16000
UDP_RECEIVER_PORT = 16100
TCP_RECEIVER_PORT = 16200
DEBUG = true
GRAPH_ENABLED = false
NEIGHBORS = udp://127.0.0.1:15100
HXI_DIR = hxi
HEADLESS = true
DB_PATH = db
ZMQ_ENABLED = false
ZMQ_ENABLE_IPC = false
ZMQ_ENABLE_TCP = false
LOCAL_SNAPSHOTS_DEPTH = 5
MS_DELAY = 0
SPAM_DELAY = 0
Leader config.ini
[SBX]
API_HOST = localhost
PORT = 15000
UDP_RECEIVER_PORT = 15100
TCP_RECEIVER_PORT = 15200
DEBUG = true
GRAPH_ENABLED = false
NEIGHBORS = udp://127.0.0.1:16100
HXI_DIR = hxi
HEADLESS = true
DB_PATH = db
ZMQ_ENABLED = true
ZMQ_ENABLE_IPC = false
ZMQ_ENABLE_TCP = true
ZMQ_PORT = 6550
LOCAL_SNAPSHOTS_DEPTH = 5
MS_DELAY = 5
SPAM_DELAY = 0
Are the nodes you are running, in the same directory with different db path configs??
Yes
I believe the problem is that rocksDBs opens up too many files, it can be fixed by fine-tuning the config
I was going to follow these suggestions: https://ro-che.info/articles/2017-03-26-increase-open-files-limit and see if they solved the problem. I am on an Ubuntu systemd, not headless, so that might be the issue.
Odd thing is, that when I spammed value-less, even under way larger load (50 spam instances with 100ms rate), I never ran into this issue.
What kind of OS was the fullnode running on? AWS? Or PC?
Are the nodes you are running, in the same directory with different db path configs??
Yes
I think thats the issue, tbh. I am not sure what is expected behavior here, but I usually run different instances from different directories for testing.
I think thats the issue, tbh. I am not sure what is expected behavior here, but I usually run different instances from different directories for testing.
Oh, I am sorry. I misread what you wrote. I am running the nodes in different directories, with different db paths.
Odd thing is, that when I spammed value-less, even under way larger load (50 spam instances with 100ms rate), I never ran into this issue.
What kind of OS was the fullnode running on? AWS? Or PC?
We did it locally on ubuntu and on aws/docker. I don't think thats the issue.
I think thats the issue, tbh. I am not sure what is expected behavior here, but I usually run different instances from different directories for testing.
Oh, I am sorry. I misread what you wrote. I am running the nodes in different directories, with different db paths.
Okay
Could you please share the db paths that you specified for each instance?
Although, I am doing something that perhaps might be worth mentioning.
I use a python script that compiles (using maven with all the tests and whatnot) one fullnode in the first directory A.
Then, the script copies the compiled target directory into a seperate location on the path outside of directory A.
Could you please share the db paths that you specified for each instance?
LEADER /home/hlx-dev/helix/testnet/fork0/testnet-1.0/db /home/hlx-dev/helix/testnet/fork0/testnet-1.0/spent-addresses-db
FOLLOWR /home/hlx-dev/helix/testnet/fork1/testnet-1.0/db /home/hlx-dev/helix/testnet/fork1/testnet-1.0/spent-addresses-db
Check this one and references therein: https://stackoverflow.com/questions/45510290/rocksdb-too-many-sst-files-of-very-small-size I believe the problem is that rocksDBs opens up too many files, it can be fixed by fine-tuning the config
I think that will just fix the specific issue, without solving the root issue (causing too many files to open). it is also worth noting, that limitations on max_open_files and other rocksdb configs are imposed in order to optimize performance.
I am still curious, why Dario and I aren't running into this issue, when stress testing. I will take some time and try to reproduce.
I am still curious, why Dario and I aren't running into this issue, when stress testing. I will take some time and try to reproduce.
Thank you!
Could you please share the db paths that you specified for each instance?
LEADER /home/hlx-dev/helix/testnet/fork0/testnet-1.0/db /home/hlx-dev/helix/testnet/fork0/testnet-1.0/spent-addresses-db
FOLLOWR /home/hlx-dev/helix/testnet/fork1/testnet-1.0/db /home/hlx-dev/helix/testnet/fork1/testnet-1.0/spent-addresses-db
Looks fine, but maybe, just for testing purposes, you could try removing this line: DB_PATH = db
and let it use the default paths.
Btw, were you able to reproduce this error on an IOTA node (with their lib)? That would be quite interesting actually!
Looks fine, but maybe, just for testing purposes, you could try removing this line:
DB_PATH = db
and let it use the default paths.
Will do.
without solving the root issue (causing too many files to open)
Right, I have just now edited my /etc/systemd/user.conf and /etc/systemd/system.conf to include, DefaultLimitNOFILE=20000
But, this seems like it will only hide the problem, if there is one, with the java program.
Btw, were you able to reproduce this error on an IOTA node (with their lib)? That would be quite interesting actually!
I will try that today.
Expected Behavior
I expect the fullnode to be able to handle 10 transaction requests per second issued by javascript clients.
Current Behavior
After about 3000 transactions, the fullnode stops responding to the client requests.
Failure Information (for bugs)
There are two error messages in the log. One error comes from the Node class, and the other error comes from the LatestSolidMilestoneTrackerImpl class. It appears related to an error with "too many open files". Below, I provide the output of ulimit from my OS, which has a default limit to open files of 1024.
Steps to Reproduce
node valueless_spammer.js
in each window.Context
Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.
Failure Logs
The two errors from the log are given below.