NebraLtd / hm-miner

Helium Miner Container
https://nebra.io/hnt
MIT License
16 stars 16 forks source link

Helium miner container crashing every few hours #64

Closed ZEROF closed 2 years ago

ZEROF commented 2 years ago

Hello,

After you pushed fix for diagnostic container miner software started crashing every few hours, but maybe this is not related, I will let you check that on your side. This is the error:

helium_crash

Miner software version: 2021.12.14.0-5 After this error I dumped logs.

Here you can find the logs (will be deleted after 1 week): https://bin.0xfc.de/?774e921d42c2f85c#4t5G5zc7DMnEUxSKyGZ31jSEVse6RuRPkQxyoaLFzyAn

shawaj commented 2 years ago

Here you can find the logs (will be deleted after 1 week): https://bin.0xfc.de/?774e921d42c2f85c#4t5G5zc7DMnEUxSKyGZ31jSEVse6RuRPkQxyoaLFzyAn

@ZEROF can you please paste the logs here instead as I can't access them there

shawaj commented 2 years ago

Also @ZEROF can you please send a ticket to support@nebra.com with a picture of the label on your device and reply back with the ticket number so we can take a further look

ZEROF commented 2 years ago

Hi @shawaj ,

Ticket on the way. You should probably login to HS and check because miner container crashed even with new firmware pushed today. I will not post logs here because it's 21MB file. But you can find path in my ticket where to look once logged with ssh.

This is ticket to follow:

325612 Helium miner container crashing every few hours

iankaufmann commented 2 years ago

I have also noticed many, many crashes daily of the helium-miner container, and also the diagnostics page fails to load most of the time.

Here is one of the crash.log files from the helium-miner container, if that helps at all.

This started happening with 2021.12.14.0-5 I think.

2022-01-02 07:13:56 =ERROR REPORT====
** Generic server miner_lora terminating 
** Last message in was {udp,#Port<0.39>,{172,17,0,5},56611,<<2,112,233,0,0,0,0,0,0,0,0,0,123,34,114,120,112,107,34,58,91,123,34,116,109,115,116,34,58,50,51,55,53,57,54,56,49,50,44,34,99,104,97,110,34,58,48,44,34,114,102,99,104,34,58,48,44,34,102,114,101,113,34,58,57,48,51,46,57,48,48,48,48,48,44,34,115,116,97,116,34,58,49,44,34,109,111,100,117,34,58,34,76,79,82,65,34,44,34,100,97,116,114,34,58,34,83,70,57,66,87,49,50,53,34,44,34,99,111,100,114,34,58,34,52,47,53,34,44,34,108,115,110,114,34,58,45,56,46,53,44,34,114,115,115,105,34,58,45,49,49,50,44,34,115,105,122,101,34,58,53,50,44,34,100,97,116,97,34,58,34,81,68,68,97,65,65,72,55,87,75,78,80,65,71,81,56,65,70,119,100,84,118,76,75,120,71,120,121,102,81,97,107,82,84,51,56,50,100,119,83,66,88,88,107,72,72,116,89,84,51,113,69,97,121,56,66,51,69,97,67,115,68,66,106,107,66,116,66,76,103,61,61,34,125,93,125>>}
** When Server state == {state,#Port<0.39>,#{0 => {gateway,0,{172,17,0,5},38948,0,29,0,#{<<"ackr">> => 100.0,<<"alti">> => 0,<<"dwnb">> => 0,<<"lati">> => 0.0,<<"long">> => 0.0,<<"rxfw">> => 0,<<"rxnb">> => 0,<<"rxok">> => 0,<<"time">> => <<"2022-01-02 07:13:31 GMT">>,<<"txnb">> => 0},[],5000000}},#{},#Fun<miner_keys.1.24379633>,<<0,58,46,111,169,175,197,112,230,148,232,113,201,25,71,96,195,36,4,134,205,142,192,212,152,171,185,246,163,210,202,210,209>>,{undefined,undefined},{0.0,0.0},true,region_us915,[903.9,904.1,904.3,904.5,904.7,904.9,905.1,905.3],{{dwell,400,20000},[]},undefined,undefined,{blockchain,"/var/data",#Ref<0.3025941025.1995571217.244896>,#Ref<0.3025941025.1995571217.244897>,#Ref<0.3025941025.1995571217.244898>,#Ref<0.3025941025.1995571217.244899>,#Ref<0.3025941025.1995571217.244903>,#Ref<0.3025941025.1995571217.244905>,#Ref<0.3025941025.1995571217.244900>,#Ref<0.3025941025.1995571217.244901>,#Ref<0.3025941025.1995571217.244902>,#Ref<0.3025941025.1995571217.244904>,{ledger_v1,"/var/data",#Ref<0.3025941025.1995571217.244906>,#Ref<0.3025941025.1995571217.244896>,#Ref<0.3025941025.1995571217.244898>,#Ref<0.3025941025.1995571217.244899>,#Ref<0.3025941025.1995571217.244903>,#Ref<0.3025941025.1995571202.245723>,active,{sub_ledger_v1,#Ref<0.3025941025.1995571217.244907>,#Ref<0.3025941025.1995571217.244908>,#Ref<0.3025941025.1995571217.244918>,#Ref<0.3025941025.1995571217.244909>,#Ref<0.3025941025.1995571217.244910>,#Ref<0.3025941025.1995571217.244911>,#Ref<0.3025941025.1995571217.244912>,#Ref<0.3025941025.1995571217.244913>,#Ref<0.3025941025.1995571217.244914>,#Ref<0.3025941025.1995571217.244915>,#Ref<0.3025941025.1995571217.244916>,#Ref<0.3025941025.1995571217.244917>,#Ref<0.3025941025.1995571217.244919>,undefined,undefined},{sub_ledger_v1,#Ref<0.3025941025.1995571217.244920>,#Ref<0.3025941025.1995571217.244921>,#Ref<0.3025941025.1995571217.244931>,#Ref<0.3025941025.1995571217.244922>,#Ref<0.3025941025.1995571217.244923>,#Ref<0.3025941025.1995571217.244924>,#Ref<0.3025941025.1995571217.244925>,#Ref<0.3025941025.1995571217.244926>,#Ref<0.3025941025.1995571217.244927>,#Ref<0.3025941025.1995571217.244928>,#Ref<0.3025941025.1995571217.244929>,#Ref<0.3025941025.1995571217.244930>,#Ref<0.3025941025.1995571217.244932>,undefined,undefined},undefined,[],undefined}}}
** Reason for termination ==
** {badarg,[{rocksdb,get,[#Ref<0.3025941025.1995571217.244906>,#Ref<0.3025941025.1995571217.244907>,<<"$var_poc_version">>,[]],[]},{blockchain_ledger_v1,config,2,[{file,"blockchain_ledger_v1.erl"},{line,1335}]},{miner_lora,handle_packets,4,[{file,"miner_lora.erl"},{line,606}]},{miner_lora,handle_json_data,4,[{file,"miner_lora.erl"},{line,557}]},{miner_lora,handle_info,2,[{file,"miner_lora.erl"},{line,400}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
2022-01-02 07:13:56 =CRASH REPORT====
  crasher:
    initial call: miner_lora:init/1
    pid: <0.1642.0>
    registered_name: miner_lora
    exception error: bad argument: [{rocksdb,get,[#Ref<0.3025941025.1995571217.244906>,#Ref<0.3025941025.1995571217.244907>,<<"$var_poc_version">>,[]],[]},{blockchain_ledger_v1,config,2,[{file,"blockchain_ledger_v1.erl"},{line,1335}]},{miner_lora,handle_packets,4,[{file,"miner_lora.erl"},{line,606}]},{miner_lora,handle_json_data,4,[{file,"miner_lora.erl"},{line,557}]},{miner_lora,handle_info,2,[{file,"miner_lora.erl"},{line,400}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]
    ancestors: [miner_restart_sup,miner_sup,<0.1498.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1615.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 28
    reductions: 707184
  neighbours:
2022-01-02 07:13:56 =SUPERVISOR REPORT====
     Supervisor: {local,miner_restart_sup}
     Context:    child_terminated
     Reason:     {badarg,[{rocksdb,get,[#Ref<0.3025941025.1995571217.244906>,#Ref<0.3025941025.1995571217.244907>,<<"$var_poc_version">>,[]],[]},{blockchain_ledger_v1,config,2,[{file,"blockchain_ledger_v1.erl"},{line,1335}]},{miner_lora,handle_packets,4,[{file,"miner_lora.erl"},{line,606}]},{miner_lora,handle_json_data,4,[{file,"miner_lora.erl"},{line,557}]},{miner_lora,handle_info,2,[{file,"miner_lora.erl"},{line,400}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
     Offender:   [{pid,<0.1642.0>},{id,miner_lora},{mfargs,{miner_lora,start_link,[#{ecdh_fun => #Fun<miner_keys.0.24379633>,radio_udp_bind_ip => {0,0,0,0},radio_udp_bind_port => 1680,radio_udp_send_ip => {0,0,0,0},radio_udp_send_port => 31341,region_override => undefined,sig_fun => #Fun<miner_keys.1.24379633>}]}},{restart_type,permanent},{significant,false},{shutdown,15000},{child_type,worker}]

2022-01-02 07:30:28 =ERROR REPORT====
** Generic server miner_ecc_worker terminating 
** Last message in was {sign,<<10,33,0,58,46,111,169,175,197,112,230,148,232,113,201,25,71,96,195,36,4,134,205,142,192,212,152,171,185,246,163,210,202,210,209,18,8,4,162,211,34,221,6,80,84,26,33,0,90,28,189,25,194,193,225,213,75,231,123,61,17,82,19,91,91,73,159,50,175,103,255,167,236,15,67,248,214,45,254,112,26,33,0,119,62,117,134,181,220,248,45,39,86,194,120,43,25,132,25,148,103,151,217,236,219,80,166,55,65,187,135,155,12,233,45,26,33,0,8,153,45,128,132,103,187,50,209,110,18,44,55,175,78,73,160,55,162,203,116,138,25,135,243,169,230,179,186,57,237,55,26,33,0,146,236,7,226,168,158,246,175,27,198,50,97,64,10,152,76,70,234,165,220,38,187,220,136,126,175,205,166,14,168,194,207,26,33,0,59,167,116,53,227,95,52,145,145,174,32,114,240,95,29,10,32,132,176,243,64,48,47,211,232,215,218,68,125,226,206,125,40,134,199,201,205,225,47,58,32,45,174,143,185,95,52,62,207,106,76,253,55,223,211,240,155,174,42,223,247,94,49,57,47,232,147,168,215,110,117,16,146,66,14,10,6,104,101,105,103,104,116,18,4,8,155,252,70,66,29,10,19,108,97,115,116,95,98,108,111,99,107,95,97,100,100,95,116,105,109,101,18,6,8,207,168,197,142,6,66,18,10,12,114,101,108,101,97,115,101,95,105,110,102,111,18,2,26,0>>}
** When Server state == {state,<0.1503.0>,0}
** Reason for termination ==
** {{case_clause,{ok,<<84,233,243,108,244,197,79,247,98,89,101,198,225,104,216,243,52,117,75,113,14,135,146,106,185,104,188,237,185,210,21,77>>}},[{ecc508,sign,3,[{file,"ecc508.erl"},{line,650}]},{miner_ecc_worker,txn,3,[{file,"miner_ecc_worker.erl"},{line,76}]},{miner_ecc_worker,handle_call,3,[{file,"miner_ecc_worker.erl"},{line,48}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
** Client libp2p_peerbook_blockchain_swarm stacktrace
** [{gen,do_call,4,[{file,"gen.erl"},{line,233}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,243}]},{miner_keys,'-keys/1-fun-1-',1,[{file,"miner_keys.erl"},{line,117}]},{libp2p_peer,sign_peer,2,[{file,"libp2p_peer.erl"},{line,426}]},{libp2p_peerbook,update_this_peer,1,[{file,"libp2p_peerbook.erl"},{line,600}]},{libp2p_peerbook,handle_cast,2,[{file,"libp2p_peerbook.erl"},{line,492}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]}]
2022-01-02 07:30:28 =CRASH REPORT====
  crasher:
    initial call: miner_ecc_worker:init/1
    pid: <0.1502.0>
    registered_name: miner_ecc_worker
    exception error: {{case_clause,{ok,<<84,233,243,108,244,197,79,247,98,89,101,198,225,104,216,243,52,117,75,113,14,135,146,106,185,104,188,237,185,210,21,77>>}},[{ecc508,sign,3,[{file,"ecc508.erl"},{line,650}]},{miner_ecc_worker,txn,3,[{file,"miner_ecc_worker.erl"},{line,76}]},{miner_ecc_worker,handle_call,3,[{file,"miner_ecc_worker.erl"},{line,48}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
    ancestors: [miner_critical_sup,miner_sup,<0.1498.0>]
    message_queue_len: 1
    messages: [{'$gen_call',{<0.1945.0>,[alias|#Ref<0.3025941025.1995505667.258876>]},{sign,<<10,242,2,10,165,2,10,33,0,58,46,111,169,175,197,112,230,148,232,113,201,25,71,96,195,36,4,134,205,142,192,212,152,171,185,246,163,210,202,210,209,18,8,4,162,211,34,221,6,80,84,26,33,0,119,62,117,134,181,220,248,45,39,86,194,120,43,25,132,25,148,103,151,217,236,219,80,166,55,65,187,135,155,12,233,45,26,33,0,8,153,45,128,132,103,187,50,209,110,18,44,55,175,78,73,160,55,162,203,116,138,25,135,243,169,230,179,186,57,237,55,26,33,0,146,236,7,226,168,158,246,175,27,198,50,97,64,10,152,76,70,234,165,220,38,187,220,136,126,175,205,166,14,168,194,207,26,33,0,59,167,116,53,227,95,52,145,145,174,32,114,240,95,29,10,32,132,176,243,64,48,47,211,232,215,218,68,125,226,206,125,40,175,247,184,205,225,47,58,32,45,174,143,185,95,52,62,207,106,76,253,55,223,211,240,155,174,42,223,247,94,49,57,47,232,147,168,215,110,117,16,146,66,29,10,19,108,97,115,116,95,98,108,111,99,107,95,97,100,100,95,116,105,109,101,18,6,8,217,248,195,142,6,66,18,10,12,114,101,108,101,97,115,101,95,105,110,102,111,18,2,26,0,66,14,10,6,104,101,105,103,104,116,18,4,8,240,249,70,18,72,48,70,2,33,0,159,229,63,1,4,10,32,211,25,217,53,56,134,85,72,73,78,52,28,193,90,5,145,211,148,139,245,254,100,213,92,67,2,33,0,249,62,58,80,23,119,11,107,87,157,158,76,55,154,130,89,21,4,10,190,33,117,45,245,22,88,200,186,242,156,27,147,18,8,4,44,238,156,97,6,8,106,26,20,187,228,175,43,219,164,104,181,96,35,105,192,24,90,85,192,217,173,99,157>>}}]
    links: [<0.1501.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 28
    reductions: 116564
  neighbours:
2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_critical_sup}
     Context:    child_terminated
     Reason:     {{case_clause,{ok,<<84,233,243,108,244,197,79,247,98,89,101,198,225,104,216,243,52,117,75,113,14,135,146,106,185,104,188,237,185,210,21,77>>}},[{ecc508,sign,3,[{file,"ecc508.erl"},{line,650}]},{miner_ecc_worker,txn,3,[{file,"miner_ecc_worker.erl"},{line,76}]},{miner_ecc_worker,handle_call,3,[{file,"miner_ecc_worker.erl"},{line,48}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,721}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,750}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
     Offender:   [{pid,<0.1502.0>},{id,miner_ecc_worker},{mfargs,{miner_ecc_worker,start_link,[0,"i2c-1",96]}},{restart_type,permanent},{significant,false},{shutdown,15000},{child_type,worker}]

2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_critical_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.1502.0>},{id,miner_ecc_worker},{mfargs,{miner_ecc_worker,start_link,[0,"i2c-1",96]}},{restart_type,permanent},{significant,false},{shutdown,15000},{child_type,worker}]

2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_restart_sup}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.1617.0>},{id,miner},{mfargs,{miner,start_link,[]}},{restart_type,permanent},{significant,false},{shutdown,15000},{child_type,worker}]

2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_sup}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.1501.0>},{id,miner_critical_sup},{mfargs,{miner_critical_sup,start_link,[{ecc_compact,{{'ECPoint',<<4,58,46,111,169,175,197,112,230,148,232,113,201,25,71,96,195,36,4,134,205,142,192,212,152,171,185,246,163,210,202,210,209,55,130,207,10,173,93,53,78,79,91,66,144,188,187,109,101,104,6,226,42,204,189,203,155,164,198,233,36,86,255,42,23>>},{namedCurve,{1,2,840,10045,3,1,7}}}},#Fun<miner_keys.1.24379633>,#Fun<miner_keys.0.24379633>,[#{id => miner_ecc_worker,modules => [miner_ecc_worker],restart => permanent,shutdown => 15000,start => {miner_ecc_worker,start_link,[0,"i2c-1",96]},type => worker}]]}},{restart_type,permanent},{significant,false},{shutdown,infinity},{child_type,supervisor}]

2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_sup}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.1501.0>},{id,miner_critical_sup},{mfargs,{miner_critical_sup,start_link,[{ecc_compact,{{'ECPoint',<<4,58,46,111,169,175,197,112,230,148,232,113,201,25,71,96,195,36,4,134,205,142,192,212,152,171,185,246,163,210,202,210,209,55,130,207,10,173,93,53,78,79,91,66,144,188,187,109,101,104,6,226,42,204,189,203,155,164,198,233,36,86,255,42,23>>},{namedCurve,{1,2,840,10045,3,1,7}}}},#Fun<miner_keys.1.24379633>,#Fun<miner_keys.0.24379633>,[#{id => miner_ecc_worker,modules => [miner_ecc_worker],restart => permanent,shutdown => 15000,start => {miner_ecc_worker,start_link,[0,"i2c-1",96]},type => worker}]]}},{restart_type,permanent},{significant,false},{shutdown,infinity},{child_type,supervisor}]

2022-01-02 07:30:28 =SUPERVISOR REPORT====
     Supervisor: {local,miner_restart_sup}
     Context:    start_error
     Reason:     noproc
     Offender:   [{pid,undefined},{id,miner},{mfargs,{miner,start_link,[]}},{restart_type,permanent},{significant,false},{shutdown,15000},{child_type,worker}]

2022-01-02 07:30:28 =CRASH REPORT====
  crasher:
    initial call: miner:init/1
    pid: <0.1951.0>
    registered_name: []
    exception exit: {noproc,[{gen,do_for_proc,2,[{file,"gen.erl"},{line,371}]},{gen_event,rpc,2,[{file,"gen_event.erl"},{line,290}]},{miner,init,1,[{file,"miner.erl"},{line,333}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,423}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
    ancestors: [miner_restart_sup,miner_sup,<0.1498.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1615.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 28
    reductions: 305
  neighbours:
PureTek-Innovations commented 2 years ago

I'm pretty sure that my miner is doing the same, firmware 2021.12.29.2, how do I access the logs? I'm sorry I don't even know how to open an SSH connection to it.

Jonzky commented 2 years ago

Just to echo what others have said, I'm noticing frequent crashes on my outdoor unit and I'm seeing similar reports from other people.

I'm seeing EEC - False fairly often and if I looked at /initFile.txt I'm getting these sort of errors when it's failing.

"ECC": "gateway_mfr test finished with error, {\"result\": \"fail\", \"tests\": [{\"output\": \"timeout/retry error\", \"result\": \"fail\", \"test\": \"serial\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"zone_locked(data)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"zone_locked(config)\"}, {\"output\": \"timeout/retry error\", \"result\": \"fail\", \"test\": \"slot_config(0..=15, ecc)\"}, {\"output\": \"timeout/retry error\", \"result\": \"fail\", \"test\": \"key_config(0..=15, ecc)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"miner_key(0)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"sign(0)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"ecdh(0)\"}]}" "ECC": "gateway_mfr test finished with error, {\"result\": \"fail\", \"tests\": [{\"output\": \"ok\", \"result\": \"pass\", \"test\": \"serial\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"zone_locked(data)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"zone_locked(config)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"slot_config(0..=15, ecc)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"key_config(0..=15, ecc)\"}, {\"output\": \"ok\", \"result\": \"pass\", \"test\": \"miner_key(0)\"}, {\"output\": \"signature error\n\nCaused by:\n 0: signature error\n 1: timeout/retry error\", \"result\": \"fail\", \"test\": \"sign(0)\"}, {\"output\": \"invalid ecdh shared secret\", \"result\": \"fail\", \"test\": \"ecdh(0)\"}]}

The main issue for me is when it crashes it starts to fall behind and it can take a while to catch up again to the current height. I also notices it's crashing/throwing the ECC error significantly more frequently since I moved it from indoors (with no witnesses / traffic) to outdoors, with a significant amount of hotspots.

Jonzky commented 2 years ago

It looks like since wget was installed to the diagnostics container it is performing a HealthCheck (/initFile.txt) 2 times every 120 seconds. https://github.com/NebraLtd/hm-diag/blob/dfa4e11e2606f85579d901dd5a30a34f91b594f3/Dockerfile#L36

I would assume that the ECC check in that diagnostic (or all of them it does) is resource intensive or takes a few seconds? Either way I think when the miner makes a call that required the ECC it's not ready to respond properly and crashes the miner. It makes sense since I am having crashes every 10-20 minutes and I'm in a dense RF environment which probably uses the ECC frequently.

I removed wget from the diagnostics container and have not had a crash in the 1-2 hours since I've done it.

image

iankaufmann commented 2 years ago

It looks like since wget was installed to the diagnostics container it is performing a HealthCheck (/initFile.txt) 2 times every 120 seconds. https://github.com/NebraLtd/hm-diag/blob/dfa4e11e2606f85579d901dd5a30a34f91b594f3/Dockerfile#L36

I would assume that the ECC check in that diagnostic (or all of them it does) is resource intensive or takes a few seconds? Either way I think when the miner makes a call that required the ECC it's not ready to respond properly and crashes the miner. It makes sense since I am having crashes every 10-20 minutes and I'm in a dense RF environment which probably uses the ECC frequently.

I removed wget from the diagnostics container and have not had a crash in the 1-2 hours since I've done it.

image

How ironic for a health check to be what is killing it...

Now I kind of regret pointing it out. I just saw errors happening due to it failing (and the container never entering a healthy state), but it would make sense that slamming whatever is going on within the health check over and over could bog things down.

Each update for the last several weeks has only made the issues I'm having more frequent.

shawaj commented 2 years ago

@Jonzky @iankaufmann the health check is not killing the miner no, the health check does not touch anything to do with the miner container

The issue was on the helium side https://engineering.helium.com/2022/01/04/blockchain-release-libp2p-and-data-transfer-fixes.html

PureTek-Innovations commented 2 years ago

Any news on this? My miner, Nebra Outdoor Hotspot Gen 1, running firmware 2022.01.04.0 appears to be frequently rebooting

PureTek-Innovations commented 2 years ago

@shawaj, why did you close this issue without resolving it first??

shawaj commented 2 years ago

It has been resolved