Fulcrum stopped processing mempool txs (Windows Server 2012)

pkoutsogiannis commented 7 months ago

We are using Fulcrum 1.9.7 (Release f27fc28)

We encountered the following issue 2 times in the past month:

Fullcrum stopped processing mempool txs without any log entry. We issued a stop command but fulcrum hang and we had to kill the process and restart it.

[2023-12-01 11:11:35.940] 51632 mempool txs involving 323803 addresses [2023-12-01 11:12:45.967] 51897 mempool txs involving 324605 addresses [2023-12-01 11:13:55.989] 52183 mempool txs involving 325474 addresses [2023-12-01 11:15:05.989] 52451 mempool txs involving 326368 addresses [2023-12-01 11:16:16.037] 52718 mempool txs involving 327421 addresses [2023-12-01 11:17:26.076] 53005 mempool txs involving 328511 addresses [2023-12-01 13:03:37.850] <AdminSrv 127.0.0.1:8000> New TCP Client.3419140 127.0.0.1:55881, 1 client total [2023-12-01 13:03:37.959] Received 'stop' command from admin RPC, shutting down ... [2023-12-01 13:03:37.959] Shutdown requested [2023-12-01 13:03:37.959] Stopping Stats HTTP Servers ... [2023-12-01 13:03:37.959] Stopping Controller ...

(we had to kill the process after 5 minutes)

The conf file:

datadir = d:\fulcrum_data bitcoind = 127.0.0.1:8332 rpcuser = redacted rpcpassword = redacted tcp = 10.190.89.8:50001 peering = false announce = false public_tcp_port = 50001 admin = 8000 stats = 8081 db_mem = 1024

cculianu commented 7 months ago

We are using Fulcrum 1.9.7 (Release https://github.com/cculianu/Fulcrum/commit/f27fc28fa25f950bb4ada4361e05972fe183dd0c) We encountered the following issue 2 times in the past month:

Fulcrum 1.9.7 has only been out for ~1 week. There was indeed a hang bug back in version 1.9.4 or so.

I see from the log this hang happened today -- but were you for sure on 1.9.7?

pkoutsogiannis commented 7 months ago

The first occurrence was with 1.9.6 last month and this is why we upgraded to 1.9.7

The log is from today.

cculianu commented 7 months ago

Darn. Ok.. I will investigate. I added some optimizations to make mempool synch much faster but they had a bunch of bugs. I thought I squashed them all but apparently maybe not. Will investigate.

cculianu commented 7 months ago

In the meantime you could just go back to Fulcrum 1.9.3 I guess or.. hang in there.

pkoutsogiannis commented 7 months ago

We are now running fulcrum with -d so that we can catch any helpful information for you.

cculianu commented 7 months ago

We are now running fulcrum with -d so that we can catch any helpful information for you.

Yes, this is extremely helpful. Thank you.

pkoutsogiannis commented 7 months ago

I forgot to mention that we are using the windows binary on windows server 2016.

Keep up the good work.

cculianu commented 7 months ago

I forgot to mention that we are using the windows binary on windows server 2016.

Keep up the good work.

Ahhh! That is helpful information! Thank you., I pray this is a windows-specific problem (but it may not be).

Question: Were you running Fulcrum previous to 1.9.4 (1.9.3, etc) for any extended periods and if so did you ever noticed this problem then?

pkoutsogiannis commented 7 months ago

It started after upgrading from 1.9.3 to 1.9.6

pkoutsogiannis commented 7 months ago

We had 1.9.3 running for an extended period indeed without this issue.

pkoutsogiannis commented 7 months ago

We had 1.9.3 running for at least a month on the windows 2016 machine.

We also have an 1.9.7 instance running on a windows 11 machine and is still error free. We had also a 1.9.6 running there without issues as well. The only difference is the windows version and that we have fast-sync=4098 and db_max_open_files=500 set.

cculianu commented 7 months ago

The only difference is the windows version and that we have fast-sync=4098 and db_max_open_files=500 set.

Yeah that shouldn't matter. I am curious if the Windows 11 machine ever has problems or not. Keep me updated. I will thoroughly review the code.

FWIW I actually have a windows laptiop here (windows 10) that's been running BTC Fulcrum for a week now with no hang (and before that 1.9.6 with no hang). I will continue to monitor the situation and also look for bugs in my code.

:/

Do let me know what happens I'll investigate this further in the meantime.

pkoutsogiannis commented 7 months ago

Note: The windows 11 machine is much faster than the windows 2016 machine, I am mentioning this just in case of some race condition.

cculianu commented 7 months ago

What are the specs on the slow machine? And.. is bitcoind running locally on both machines or is one connecting to the bitcoind process on the other?

pkoutsogiannis commented 7 months ago

There are 2 separate and unrelated machines running bitcoin and fulcrum locally on the same machine respectively.

Windows 2016:

cpu: intel xeon e5-2620 2,10GHz memory: 64G disk: 2TB ssd

bitcoind config:

txindex=1 server=1 listen=0 rpcbind=127.0.0.1 rpcallowip=127.0.0.1 rpcuser = redacted rpcpassword = redacted rpcworkqueue=1000 zmqpubhashblock=tcp://127.0.0.1:8433

Windows 11:

cpu: amd ryzen 5 5560U memory: 16G disk: 2TB ssd (Samsung 990 PRO NVMe M.2 SSD, 2TB, PCIe 4.0)

bitcoind config:

txindex=1 server=1 listen=0 rpcbind=127.0.0.1 rpcallowip=127.0.0.1 rpcuser = redacted rpcpassword = redacted rpcworkqueue=1000 zmqpubhashblock=tcp://127.0.0.1:8433

pkoutsogiannis commented 7 months ago

https://cpu.userbenchmark.com/Compare/Intel-Xeon-E5-2620-0-vs-AMD-Ryzen-5-5600U/m5895vsm1461589

cculianu commented 7 months ago

You know in my experience setting the rpcworkqueue=1000 on bitcoind is asking for trouble. If bitcoind can't keep up with requests, it's best for it to error-out early. Having a queue of 1000 requests lined up, may lead to ridiculous timeouts. You are better off having bitcoind saturate its rpcworkqueue early. There is a reason why Core has this defaulting to 16... I am not sure what docs you read that recommended this be raised.. can you tell me where you read that you should raise this?

Question: Are you hitting bitcoind directly to do any processing outside of Fulcrum? For example: are you doing expensive calls to bitcoind (such as mining, scantxoutset, etc) outside of Fulcrum via bitcoind's RPC?

pkoutsogiannis commented 7 months ago

The rpcworkqueue was set to 1000 for no actual reason. We found that as a recommendation from someone on the team few months ago.

Both bitcoind are used solely by fulcrum only. Fulcrum on windows 2016 (the one which hang) is not even used by any client since it serves as a backup service. It just sits there idle.

pkoutsogiannis commented 7 months ago

Shall we change the rpcworkqueue back to 16 and restart fulcrum in debug mode again?

cculianu commented 7 months ago

Well I actually don't think that was the problem -- since anyway Fulcrum should have been able to exit in a timely manner. It shouldn't hang like that either way. And if you say RPC is only used by Fulcrum.. anyway Fulcrum doesn't make "expensive" calls that eat a ton of time (such as mining or scantxoutset).

Your choice .. can leave it as-is.. or set it to default just to see if "that fixed it". Up to you.

pkoutsogiannis commented 7 months ago

Since there are no other rpc calls except fulcrum I will leave it running as it is and will update you if it hangs again with the debug log.

cculianu commented 6 months ago

Is no news good news? Has it been running smoothly all this time?

pkoutsogiannis commented 6 months ago

I am monitoring it everyday and till now there was no incident.

pkoutsogiannis commented 6 months ago

We got bad news. Unfortunately it stopped processing mempool txs. Also, after issuing a stop command, it got stuck in joining thread log line and I had to kill the process.

[2024-01-19 05:37:21.127] (Debug) 54798 mempool txs involving 289259 addresses (exclusive lock held for 2.030 msec) [2024-01-19 05:37:23.375] (Debug) getrawmempool: got reply with 54829 items, 0 ignored, 0 dropped, 31 new (reply took: 154.112 msec, processing took: 58.417 msec) [2024-01-19 05:37:23.375] (Debug) Thread started [2024-01-19 05:37:23.391] (Debug) downloaded 31 txs (failed: 0, ignored: 0), elapsed so far: 0.221 secs [2024-01-19 05:37:23.391] (Debug) Precached 30/33 inputs in 7.265 msec, of which 1.305 msec was spent processing, thread exiting. [2024-01-19 05:37:23.391] (Debug) 54829 mempool txs involving 289272 addresses (exclusive lock held for 1.259 msec) [2024-01-19 05:37:25.610] (Debug) getrawmempool: got reply with 54866 items, 0 ignored, 0 dropped, 37 new (reply took: 154.447 msec, processing took: 59.890 msec) [2024-01-19 05:37:25.610] (Debug) Thread started [2024-01-19 05:37:25.626] (Debug) downloaded 37 txs (failed: 0, ignored: 0), elapsed so far: 0.223 secs [2024-01-19 05:37:25.626] (Debug) Precached 5/37 inputs in 8.223 msec, of which 0.171 msec was spent processing, thread exiting. [2024-01-19 05:37:25.626] (Debug) 54866 mempool txs involving 289295 addresses (exclusive lock held for 1.005 msec) [2024-01-19 05:37:27.861] (Debug) getrawmempool: got reply with 54902 items, 0 ignored, 0 dropped, 36 new (reply took: 154.117 msec, processing took: 58.803 msec) [2024-01-19 05:37:27.861] (Debug) Thread started [2024-01-19 05:37:27.876] (Debug) downloaded 36 txs (failed: 0, ignored: 0), elapsed so far: 0.222 secs [2024-01-19 05:37:27.876] (Debug) Precached 19/51 inputs in 11.076 msec, of which 5.321 msec was spent processing, thread exiting. [2024-01-19 05:37:27.892] (Debug) 54902 mempool txs involving 289315 addresses [2024-01-19 05:37:30.112] (Debug) getrawmempool: got reply with 54972 items, 0 ignored, 0 dropped, 70 new (reply took: 155.114 msec, processing took: 59.566 msec) [2024-01-19 05:37:30.112] (Debug) Thread started [2024-01-19 05:37:30.127] (Debug) downloaded 70 txs (failed: 0, ignored: 0), elapsed so far: 0.228 secs [2024-01-19 05:37:30.127] (Debug) Precached 11/79 inputs in 12.933 msec, of which 3.781 msec was spent processing, thread exiting. [2024-01-19 05:37:30.143] (Debug) 54972 mempool txs involving 289350 addresses (exclusive lock held for 0.663 msec) [2024-01-19 05:37:32.368] (Debug) getrawmempool: got reply with 55042 items, 0 ignored, 0 dropped, 70 new (reply took: 155.610 msec, processing took: 59.486 msec) [2024-01-19 05:37:32.368] (Debug) Thread started [2024-01-19 05:37:32.383] (Debug) downloaded 70 txs (failed: 0, ignored: 0), elapsed so far: 0.228 secs [2024-01-19 05:37:32.383] (Debug) Precached 16/79 inputs in 12.287 msec, of which 2.329 msec was spent processing, thread exiting. [2024-01-19 05:37:32.399] (Debug) 55042 mempool txs involving 289382 addresses (exclusive lock held for 0.568 msec) [2024-01-19 05:38:30.649] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 05:48:20.928] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 05:48:45.662] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 05:56:54.910] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:03:31.190] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:04:50.111] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:06:11.095] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:09:30.563] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:17:58.624] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:21:13.498] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:22:37.826] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:27:51.403] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:43:26.227] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 06:48:44.695] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 07:04:55.910] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 07:34:55.918] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ... [2024-01-19 07:36:16.606] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 07:42:56.948] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 07:44:59.385] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 07:58:01.772] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 08:08:30.442] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 08:22:17.314] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 08:22:55.220] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 08:25:58.985] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 08:55:59.009] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ... [2024-01-19 09:09:57.787] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:16:59.895] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:22:32.987] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:27:51.829] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:28:18.939] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:45:58.794] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:48:46.512] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:58:24.682] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 09:58:46.619] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 10:12:07.944] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 10:27:17.019] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 10:30:35.049] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 10:58:49.089] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 11:02:24.635] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 11:04:13.323] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 11:26:58.364] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 11:37:34.065] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:04:43.246] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:21:38.289] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:32:47.942] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:34:44.333] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:42:07.050] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:57:28.015] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 12:57:54.858] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:15:32.854] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:23:07.431] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:35:54.506] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:40:07.927] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:41:39.051] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 13:49:55.143] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 14:11:10.169] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 14:14:24.747] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 14:28:02.665] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 14:33:37.024] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 14:35:20.195] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 15:05:20.204] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ... [2024-01-19 15:08:05.203] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 15:30:07.745] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 16:00:07.753] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ... [2024-01-19 16:08:08.158] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 16:18:23.015] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 16:20:43.545] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 16:45:05.727] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 16:52:12.491] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 17:05:32.191] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 17:11:31.658] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45 [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) Got connection from: 127.0.0.1:61727 [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) on_connected 30481655 [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> New TCP Client.30481655 127.0.0.1:61727, 1 client total [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) TCP Client.30481655 (id: 30481655) 127.0.0.1:61727 socket disconnected [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) TCP Client.30481655 (id: 30481655) 127.0.0.1:61727 lost connection [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) killClient (id: 30481655) [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) do_disconnect (abort) 30481655 [2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) Client 30481655 destructing [2024-01-19 17:19:58.969] Received 'stop' command from admin RPC, shutting down ... [2024-01-19 17:19:58.969] Shutdown requested [2024-01-19 17:19:58.969] (Debug) void App::cleanup() [2024-01-19 17:19:58.969] Stopping Stats HTTP Servers ... [2024-01-19 17:19:58.969] (Debug) HttpSrv 127.0.0.1:8081 thread is running, joining thread [2024-01-19 17:19:58.969] (Debug) HttpSrv 127.0.0.1:8081 cleaned up 5 signal/slot connections [2024-01-19 17:19:58.969] (Debug) ~AbstractTcpServer [2024-01-19 17:19:58.969] Stopping Controller ... [2024-01-19 17:19:58.969] (Debug) Controller thread is running, joining thread

cculianu commented 6 months ago

So there must be some issue at least on windows. You used the provided windows binary correct ?

I’ll have to investigate this when I get some free time.

pkoutsogiannis commented 6 months ago

So there must be some issue at least on windows. You used the provided windows binary correct ?

Correct.

pkoutsogiannis commented 6 months ago

I have reverted back to 1.9.3 and I will monitor this as well.

Kudos for the excellent work.

cculianu commented 6 months ago

Yeah if 1.9.3 never hangs I can just undo the optimization I added for a threaded prefetcher of coins. It only shaves a few seconds off the synchmempool on large mempools (60k txns+).. but if it means there is some instability with it for whatever reason it's gone. Do let me know how 1.9.3 works out.

pkoutsogiannis commented 6 months ago

Fulcrum (1.9.3) hang and we had to kill the process after it did not stop after issuing a stop command. Maybe the problem is with the specific os (windows Server 2012 R2) since the other instance running on Windows 11 never hang sofar.

[2024-01-23 02:47:42.621] Block height 826926, downloading new blocks ... [2024-01-23 02:47:43.220] Processed 1 new block with 2723 txs (6938 inputs, 3854 outputs, 6974 addresses), verified ok. [2024-01-23 02:47:43.222] Block height 826926, up-to-date [2024-01-23 02:47:53.249] 37937 mempool txs involving 270164 addresses [2024-01-23 02:49:03.268] 38151 mempool txs involving 271709 addresses [2024-01-23 02:50:13.326] 38328 mempool txs involving 272184 addresses [2024-01-23 02:51:23.313] 38527 mempool txs involving 272953 addresses [2024-01-23 02:52:33.317] 38696 mempool txs involving 273690 addresses [2024-01-23 02:53:43.316] 38849 mempool txs involving 274557 addresses [2024-01-23 02:54:03.597] Block height 826927, downloading new blocks ... [2024-01-23 02:54:04.463] Processed 1 new block with 1088 txs (7525 inputs, 4586 outputs, 9123 addresses), verified ok. [2024-01-23 02:54:04.465] Block height 826927, up-to-date [2024-01-23 02:54:44.472] 38372 mempool txs involving 268470 addresses [2024-01-23 13:18:24.078] <AdminSrv 127.0.0.1:8000> New TCP Client.2081457 127.0.0.1:61892, 1 client total [2024-01-23 13:18:24.187] Received 'stop' command from admin RPC, shutting down ... [2024-01-23 13:18:24.187] Shutdown requested [2024-01-23 13:18:24.187] Stopping Stats HTTP Servers ... [2024-01-23 13:18:24.187] Stopping Controller ...

pkoutsogiannis commented 6 months ago

The instance (1.9.7) running on Windows 11 that never hang is up and running since Dec 6th 2023.

cculianu commented 6 months ago

And just to be clear — the one that hung was 1.9.3 right? So it definitely isn’t my new mempool changes.

Ok in a way this is good news but in another way it’s bad since if Fulcrum is triggering some OS specific issues that’s incredibly hard to troubleshoot.

Good to know it’s not my recent changes though. That’s a relief!

cculianu commented 6 months ago

Is there any way you can install a service pack or somehow update the Windows Server 2012 box? Who knows maybe that magically fixes it?

pkoutsogiannis commented 6 months ago

I have all service packs already installed on windows server 2012. I will continue monitoring the windows 11 instance though to ensure that the problem was os specific.

Keep up that the good work!

cculianu commented 6 months ago

Thanks man. This was a relief though to learn that it's not specific to 1.9.5+, but some other unknown issue. Oh -- there is a new 1.9.8 FYI -- the major change is it calculates fees more accurately for BTC.

I am starting to suspect the hang somehow may happen within rocksdb. One thing I could do is make a custom build of the Windows binary that uses the latest RocksDB 8.10.0 -- that's one option here (but that would require me to spend 3-4 hours mucking about the docker builder to build it, and I am not sure I have that much free time this week for that).

cculianu / Fulcrum

Fulcrum stopped processing mempool txs (Windows Server 2012) #217