Miners are not permanently deleted from activeWorkers

Alanz2223 commented 7 years ago

While logging whether miners were being deleted after disconnection or not I encountered a very strange problem. Within the enumerateWorkerStats function that is called every 15 secs, once it deletes inactive miners on line #L592 The next time that same function is executed from the timer, the miner reference/object will still be there. So its never deleted at all. You can see this for yourself by logging the activeWorkers[poolID] variable on each call and see that the worker is still there. The weird thing is that if you log activeWorkers[poolID] right after the delete statement it seems that the property was successfully deleted because that miner doesnt get logged...

heres a dump from stdout

whenever the miner client disconnects

miners Miner disconnected via standard close

then after about 120 seconds when the enumerateWorker function detects that the worker is no longer active this gets logged

workers Worker: 1 currently has 0 miners connected at 0 h/s with an average diff of NaN +13s DELETED WORKER workers Worker: 2 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 3 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 4 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 5 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 6 currently has 0 miners connected at 0 h/s with an average diff of NaN +1ms workers Worker: 7 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 8 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms The proxy currently has 0 miners connected at 0 h/s with an average diff of NaN

workers Worker: 1 currently has 0 miners connected at 0 h/s with an average diff of NaN +15s DELETED WORKER workers Worker: 2 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 3 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 4 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 5 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 6 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 7 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms workers Worker: 8 currently has 0 miners connected at 0 h/s with an average diff of NaN +0ms The proxy currently has 0 miners connected at 0 h/s with an average diff of NaN

Here is the line

https://github.com/Snipa22/xmr-node-proxy/blob/master/proxy.js#L592

my node --version v6.11.2

Alanz2223 commented 7 years ago

bump

pulketo commented 7 years ago

too short time for a bump... don't you think?

grunjol commented 7 years ago

I commented the same thing to @Snipa22 via private chat earlier this week, but he was having a rough time fixing the pool.

The activeWorkers (stats) items are deleted if there is no update in 120 secs (in the master process), but the Miner objects in activeMiners array (in the child processes) are not.

If you have stable client connections (in number and reliability) then you won't see any problem, but if you have lot of connections (i tested with 20K~40K concurrent connections) with a very high connection/disconnection rate (in my case 10 per second), you will see a lot of inactive MIners dangling around. The problem is Miner objects contain references of connections (ie, the socket, ids, stats) via closures.

I took 2 heap snapshots on the same child with 30 min of separation and compared using the chrome devtool profiler

In my case, the memory was not a problem per se (the server has 16GB) but eventually when the proxy had 6GB of RAM, the GC interrupted so frequently and took most of the CPU time with the STW, that degraded (and finally halted) the processes.

To fix the problem I checked the miners, deleting them if no activity was logged in the last 5 minutes (I made it modifying the activeMiners loop where you update the difficulty that is executed every 45 secs)

After the change, the memory and cpu usage are stable (tested 4 days so far) I hope it helps

slayerulan commented 7 years ago

@grunjol can you send a pull request regarding that?

Alanz2223 commented 7 years ago

@grunjol @slayerulan Sorry for the late reply but I solved the issue a couple of hours in by simply storing the miner's id on the socket once it connects, then once a close/disconnect event is emitted I simply delete the activeMiner's reference to that miner... The reason why the master miner's id is kept around is because there is an active interval on each worker that passes the stats up to the master process so you have to delete the miner reference directly on the miner.

I agree, if you have a stable mining pool this shouldnt be an issue but in cases where you have a high connect/disconnect number it could cause issues.

Snipa22 commented 6 years ago

Should be patched in commit: 70b8cdfae4d9d43d96d64f9ce5ebdd5612218829

Please pull, npm install, and reboot your proxies to test.

dancudds commented 6 years ago

I have miners that can disconnect quite frequently (i.e. at least every hour) and get a new IP. That means I get a lot of workers hanging around and socket timeouts in the logs. Any way to clean up the logs?

Snipa22 / xmr-node-proxy

Miners are not permanently deleted from activeWorkers #54