Open ThomasBlock opened 2 years ago
Hi, timeouts are often temporary and recoverable, but failed_to_instantiate_rocksdb_iterator
requires a restart. Could you give some more details about your setup (which disks and filesystems do you use, how many MB/s does the node download)?
Thank you for your fast reply! yes it recovers.. but now ten hours there is a similar error: ( LOCK: No locks available )
Aug 29 17:27:50 lauch start[2166672]: 2022-08-29T17:27:50.275401+02:00 [error] gen_server.erl:949 Generic server ar_chunk_storage terminating. Reason: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[{file,"/root/arweave/apps/arweave/src/ar_chunk_storage.erl"},{line,140}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}. Last message: {put,90771323920630,<<236,228,47,223,15,180,216...
Aug 29 17:27:50 lauch start[2166672]: 2022-08-29T17:27:50.278265+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_chunk_storage:init/1, pid: <0.25695.198>, registered_name: ar_chunk_storage, error: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[{file,"/root/arweave/apps/arweave/src/ar_chunk_storage.erl"},{line,140}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}, ancest...
Aug 29 17:27:50 lauch start[2166672]: 2022-08-29T17:27:50.279380+02:00 [error] supervisor.erl:736 Supervisor: {local,ar_sup}. Context: child_terminated. Reason: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[{file,"/root/arweave/apps/arweave/src/ar_chunk_storage.erl"},{line,140}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}. Offender: id=ar_chunk_storage,pid=<0.25695.198>.
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.620953+02:00 [error] : Error in process <0.11227.269> on node 'arweave@127.0.0.1' with exit value:, {badarg,[{rocksdb,iterator_move,[#Ref<0.3972060598.3831889989.231906>,{seek,<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,20,148,255,214,122>>}],[]},{ar_kv,get_next_by_prefix,4,[{file,"/root/arweave/apps/arweave/src/ar_kv.erl"},{line,77}]},{ar_data_sync,get_chunk,7,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1016}]},{ar_http_iface_middleware,handle_get_ch...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.621162+02:00 [error] : Ranch listener ar_http_iface_listener, connection process <0.13087.269>, stream 1 had its request process <0.21672.268> exit with reason badarg and stacktrace [{rocksdb,iterator_move,[#Ref<0.3972060598.3831889989.231906>,{seek,<<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,20,148,255,214,122>>}],[]},{ar_kv,get_next_by_prefix,4,[{file,"/root/arweave/apps/arweave/src/ar_kv.erl"},{line,77}]},{ar_data_sync,get_chunk,7,[{file,"/root/arweave/apps/arweave/sr...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.620886+02:00 [error] gen_server.erl:949 Generic server ar_data_sync terminating. Reason: {{case_clause,{'EXIT',{{{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[{...}|...]},{gen_server,try_handle_call,4,[...]},{gen_server,handle_msg,6,...},{proc_lib,init_p_do_apply,...}]},{gen_server,call,[ar_chunk_storage,{put,90771305308406,...}]}}}},[{ar_data_sync,store_chunk,3,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,2294}]},{ar_data_sync,handle_info,2,[{file,"...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.628795+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.1209.0>, registered_name: ar_data_sync, error: {{case_clause,{'EXIT',{{{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[{file,"/root/arweave/apps/arweave/src/ar_chunk_storage.erl"},{line,140}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.630469+02:00 [error] supervisor.erl:736 Supervisor: {local,ar_sup}. Context: child_terminated. Reason: {{case_clause,{'EXIT',{{{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,...},{gen_server,try_handle_call,...},{gen_server,...},{...}]},{gen_server,call,[ar_chunk_storage,{...}]}}}},[{ar_data_sync,store_chunk,3,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,2294}]},{ar_data_sync,handle_info,2,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.637107+02:00 [error] supervisor.erl:846 Supervisor: {local,ar_sup}. Context: start_error. Reason: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,232}]},{gen...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.637217+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.9600.269>, registered_name: [], error: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_da...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.642980+02:00 [error] supervisor.erl:846 Supervisor: {local,ar_sup}. Context: start_error. Reason: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,232}]},{gen...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.643069+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.19074.269>, registered_name: [], error: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_d...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.648916+02:00 [error] supervisor.erl:846 Supervisor: {local,ar_sup}. Context: start_error. Reason: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,232}]},{gen...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.649022+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.4430.269>, registered_name: [], error: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_da...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.654200+02:00 [error] supervisor.erl:846 Supervisor: {local,ar_sup}. Context: start_error. Reason: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,232}]},{gen...
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.654566+02:00 [error] supervisor.erl:805 Supervisor: {local,ar_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=ar_data_sync,pid={restarting,<0.1209.0>}.
Aug 29 17:27:51 lauch start[2166672]: 2022-08-29T17:27:51.654293+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.5702.269>, registered_name: [], error: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661771585 acquiring thread 140643696051776: /media/nvm6/weave/rocksdb/ar_data_sync_db/LOCK: No locks available"}}},[{ar_data_sync,init_kv,0,[{file,"/root/arweave/apps/arweave/src/ar_data_sync.erl"},{line,1256}]},{ar_data_sync,init,1,[{file,"/root/arweave/apps/arweave/src/ar_da...
Aug 29 17:27:52 lauch start[2166672]: 2022-08-29T17:27:52.765634+02:00 [error] proc_lib.erl:525 crasher: initial call: gun:proc_lib_hack/5, pid: <0.2654.269>, registered_name: [], error: {undef,[{erlang,get_stacktrace,[],[]},{gun,proc_lib_hack,5,[{file,"/root/arweave/_build/default/lib/gun/src/gun.erl"},{line,654}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}, ancestors: [gun_sup,<0.750.0>], message_queue_len: 0, messages: [], links: [<0.751.0>], dictionary: [], trap_exit: false, status: running, heap_size: 610, stack_...
Aug 29 17:27:52 lauch start[2166672]: 2022-08-29T17:27:52.767259+02:00 [error] supervisor.erl:751 Supervisor: {local,gun_sup}. Context: child_terminated. Reason: {undef,[{erlang,get_stacktrace,[],[]},{gun,proc_lib_hack,5,[{file,"/root/arweave/_build/default/lib/gun/src/gun.erl"},{line,654}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}. Offender: id=gun,pid=<0.2654.269>.
Aug 29 17:28:32 lauch start[2166672]: 2022-08-29T17:28:32.146339+02:00 [error] supervisor.erl:905 Supervisor: {local,ar_sup}. Context: shutdown_error. Reason: killed. Offender: id=ar_bridge_sup,pid=<0.1162.0>.
Aug 29 17:28:32 lauch start[2166672]: 2022-08-29T17:28:32.146624+02:00 [error] gen_server.erl:949 Generic server ar_bridge terminating. Reason: killed. Last message: {'EXIT',<0.1162.0>,killed}. State: {state,{90988,{{{467,1005406},{221,3,133,83,1984},{<<"{\"new_block\":{\"usd_to_ar_rat"...>>,{block,<<92,13,121,6,...>>,<<214,159,162,...>>,1661771818,1661770320,115792089211204725463925141277326392909849545169910489760771025478983365302860,...},30008951316497}},{{{409,1005406},{125,73,124,48,1984},{<<"{\"new_block\":{\"usd_t"...>>,{block,<<"\\...
Aug 29 17:28:32 lauch start[2166672]: 2022-08-29T17:28:32.269405+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_bridge:init/1, pid: <0.1163.0>, registered_name: ar_bridge, exit: {killed,[{gen_server,decode_msg,9,[{file,"gen_server.erl"},{line,481}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}, ancestors: [ar_bridge_sup,ar_sup,<0.800.0>], message_queue_len: 0, messages: [], links: [], dictionary: [], trap_exit: true, status: running, heap_size: 833026, stack_size: 29, reductions: 61419816; neighbours: []
Aug 29 17:28:32 lauch start[2166672]: 2022-08-29T17:28:32.533476+02:00 [error] gen_server.erl:949 Generic server ar_block_propagation_worker8 terminating. Reason: killed. Last message: {'EXIT',<0.1162.0>,killed}. State: {state}.
and berfore there were hundreds of this new error messages today:
2022-08-29T11:48:19.847061+02:00 [warning] /root/arweave/apps/arweave/src/ar_kv.erl:71 event: failed_to_instantiate_rocksdb_iterator, offset: 89488764477687
2022-08-29T11:48:19.847355+02:00 [warning] /root/arweave/apps/arweave/src/ar_kv.erl:71 event: failed_to_instantiate_rocksdb_iterator, offset: 22596763372121
This node is my strongest, as i want to download the full weave. Dual EPYC CPU, 600 GB RAM, 3 TB Enterprise NVMe for data folder, ext4, Ubuntu22. only chunk_storage is symlinked to a nfs server ( HDD drives RAID6 as ext4 ). I have 100% blocks and 34 TB weave, 10000 peers.. the download never reaches more than 40MB/s .. so i get 1-2 TB/day .. this is not much considering that the system uses sometimes 20 cores, 300 GB RAM and has a Gigabit Connection. any ideas how to increase this are highly welcome..
on the other hand i guess i should not put too much stress on the HDDs? i often see these messages which i did not see on a ssd chunk_storage:
Aug 11 20:04:28 lauch nice[1086191]: 2022-08-11T20:04:28.454430+02:00 [error] /root/arweave/apps/arweave/src/ar_data_sync.erl:2312 event: failed_to_store_chunk, reason: timeout, absolute_chunk_offset: 49337222602998, data_root: <<"VJezFBIS1L524K4ECf77JY5wpTTqH3pLjCmmbnQKwyw">>
Aug 11 20:05:14 lauch nice[1086191]: 2022-08-11T20:05:14.838504+02:00 [error] /root/arweave/apps/arweave/src/ar_data_sync.erl:2312 event: failed_to_store_chunk, reason: timeout, absolute_chunk_offset: 49338902946038, data_root: <<"-7hgn-CktYE6MVrRXdofPJ0bmYHWSAykyHlV73QkC0U">>
my config:
"max_connections" : 4000,
"sync_jobs":600,
"header_sync_jobs":1,
"packing_rate":100
mining is false, fs.file-max = 4194304, vm.nr_hugepages=1250
update on this: one more error failed_to_instantiate_rocksdb_iterator / IO error: lock hold by current process but arweave restarted itself and is now working fine again. here some logs
2022-08-30T07:40:05.404606+02:00 [error] gen_server.erl:949 Generic server ar_chunk_storage terminating. Reason: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,...},{gen_server,...}|...]}. Last message: {put,46375135650038,<<214,238,143,17,216,197,211,240,150,138,...>>}. State: {state,#{89030393856000 =>
<<"8903"...>>,92283076608000 => <<...>>,...}}. Client ar_data_sync stacktrace: [{gen,do_call,4,[{...}|...]},{gen_server,call,2,[...]},{ar_chunk_storage,put,2,...},{ar_data_sync,write_not_bla...
2022-08-30T07:40:05.415574+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_chunk_storage:init/1, pid: <0.1208.0>, registered_name: ar_chunk_storage, error: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,[...]},{gen_server,try_handle_call,4,...},{gen_server,handle_msg,...},{proc_lib,...}]}, ance
stors: [ar_sup|...], message_queue_len: ..., messages: [{'$gen_call',...}], links: [<0.924.0>|...], dictionary: [{...}|...], trap_exit: true, status: running, heap_size: 121536, stack_size: ...
2022-08-30T07:40:05.425486+02:00 [error] supervisor.erl:736 Supervisor: {local,ar_sup}. Context: child_terminated. Reason: {{badmatch,{error,timeout}},[{ar_chunk_storage,handle_call,3,...},{gen_server,try_handle_call,...}|...]}. Offender: id=ar_chunk_storage,pid=<0.1208.0>.
2022-08-30T07:40:06.206464+02:00 [info] /root/arweave/apps/arweave/src/ar_data_sync.erl:931 event: terminate, reason: {{case_clause, {'EXIT', {{{badmatch,{error,timeout}}, [{ar_chunk_storage,handle_call,3, [{file,"/root/arweave/apps/arweave/src/ar_chunk_storage.erl"}, {line,140}]}, {gen_server,try_handle_call,4, [{fi
le,"gen_server.erl"},{line,721}]}, {gen_server,handle_msg,6,[{file,"gen_server.erl"},...
2022-08-30T07:40:06.577643+02:00 [warning] /root/arweave/apps/arweave/src/ar_kv.erl:71 event: failed_to_instantiate_rocksdb_iterator, offset: 91035148263671
2022-08-30T07:40:06.589557+02:00 [warning] /root/arweave/apps/arweave/src/ar_kv.erl:71 event: failed_to_instantiate_rocksdb_iterator, offset: 91034420814071
2022-08-30T07:40:09.929928+02:00 [error] : Ranch listener ar_http_iface_listener, connection process <0.3240.243>, stream 1 had its request process <0.18675.243> exit with reason badarg and stacktrace [{rocksdb,get,[#Ref<0.2575953317.208797767.249678>,#Ref<0.2575953317.208797767.249683>,<<"{àð1"...>>,[]],[]},{ar_data
_sync,get_tx_data,5,[{file,...},{...}]},{ar_storage,read_tx_data_from_kv_storage,1,[{...}|...]},{ar_storage,read_tx2,1,[...]},{ar_http_iface_middleware,handle_get_tx,3,...},{ar_http_iface_mi...
2022-08-30T07:40:09.929800+02:00 [error] : Error in process <0.10035.243> on node 'arweave@127.0.0.1' with exit value:, {badarg,[{rocksdb,get,[#Ref<0.2575953317.208797767.249678>,#Ref<0.2575953317.208797767.249683>,<<"{àð13"...>>,[]],[]},{ar_data_sync,get_tx_data,5,[{file,[...]},{line,...}]},{ar_storage,read_tx_data_
from_kv_storage,1,[{file,...},{...}]},{ar_storage,read_tx2,1,[{...}|...]},{ar_http_iface_middleware,handle_get_tx,3,[...]},{ar_http_iface_middleware,handle,3,...},{ar_http_iface_middleware,'...
2022-08-30T07:40:10.887284+02:00 [error] supervisor.erl:846 Supervisor: {local,ar_sup}. Context: start_error. Reason: {{badmatch,{error,{db_open,...}}},[{ar_data_sync,init_kv,0,...},{ar_data_sync,init,...},{gen_server,...}|...]}. Offender: id=ar_data_sync,pid=<0.1211.0>.
2022-08-30T07:40:10.887490+02:00 [error] proc_lib.erl:525 crasher: initial call: ar_data_sync:init/1, pid: <0.24763.242>, registered_name: [], error: {{badmatch,{error,{db_open,"IO error: lock hold by current process, acquire time 1661823973 acquiring thread 139912577357376: /media/nvm6/weave/rocksdb/ar_data_sync_dat
a_root_i"...}}},[...]}, ancestors: [ar_sup|...], message_queue_len: ..., messages: [{...}|...], links: [<0.924.0>], dictionary: [], trap_exit: true, status: running, heap_size: 28690, stack_...
Hi Lev, thank you for your continous work here. As i use arweave more and more, i discover more and more errors. i check the old issues, but my errors seem all to be "new". are you interested in all of them? or should we accept the fact that there are problems from time to time..
This crash leaves arweave in a "half open" state - process still running but not responding to http request or doing anything else. But luckily it can be fixed with a restart.