ElementsProject / lightning

Core Lightning — Lightning Network implementation focusing on spec compliance and performance
Other
2.85k stars 901 forks source link

autoclean not doing anything in 22.11 #5766

Closed daywalker90 closed 1 year ago

daywalker90 commented 1 year ago

I configured autoclean like this and waited ~20 hours after upgrading to 22.11 and it's still at 0:

"autoclean": {
      "succeededforwards": {
         "enabled": true,
         "age": 2851200,
         "cleaned": 0
      },
      "failedforwards": {
         "enabled": true,
         "age": 2851200,
         "cleaned": 0
      },
      "succeededpays": {
         "enabled": false,
         "cleaned": 0
      },
      "failedpays": {
         "enabled": true,
         "age": 3600,
         "cleaned": 0
      },
      "paidinvoices": {
         "enabled": false,
         "cleaned": 0
      },
      "expiredinvoices": {
         "enabled": true,
         "age": 3600,
         "cleaned": 0
      }
   }

In the log i found these lines containing "autoclean":

Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/succeededforwards/num: Parsing '{datastore:[0:': token has no index 0: []
Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/failedforwards/num: Parsing '{datastore:[0:': token has no index 0: []
Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/succeededpays/num: Parsing '{datastore:[0:': token has no index 0: []
Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/failedpays/num: Parsing '{datastore:[0:': token has no index 0: []
Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/paidinvoices/num: Parsing '{datastore:[0:': token has no index 0: []
Dec 01 18:37:39 lightningd[2226843]: DEBUG   plugin-autoclean: listdatastore error autoclean/expiredinvoices/num: Parsing '{datastore:[0:': token has no index 0: []

and

Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,succeededforwards,num]
Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Not interested
Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,failedforwards,num]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Not interested
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,succeededpays,num]
Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:38 lightningd[2226843]: DEBUG   lightningd: Not interested
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,failedpays,num]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Not interested
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,paidinvoices,num]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Not interested
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Looking for [autoclean,expiredinvoices,num]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Got [commando,rune_counter]
Dec 01 18:37:39 lightningd[2226843]: DEBUG   lightningd: Not interested

The node is over 6 months old and it should delete 10s of thousands of succeeded forwards for example. Running the autoclean-once command doesn't return for hours and i can't notice any activity on the db so i cancelled it.

devastgh commented 1 year ago

Same issue here.

rustyrussell commented 1 year ago

Hmm, works for me, at least using autoclean-once (which, I've just realized, is not documented!)

$ lightning-cli autoclean-once failedforwards 100000000
{
   "autoclean": {
      "failedforwards": {
         "cleaned": 23,
         "uncleaned": 23285
      }
   }
}
rusty@ubuntu-1gb-sgp1-01:~$ lightning-cli autoclean-once failedforwards 100000000
{
   "autoclean": {
      "failedforwards": {
         "cleaned": 0,
         "uncleaned": 23241
      }
   }
}

If you turn off all autoclean settings in your config and run autoclean-once, do you get similar results?

daywalker90 commented 1 year ago

I tried it two times with very high values for seconds in hopes it returns faster:

lightning-cli autoclean-once succeededforwards 285120000
{
   "autoclean": {
      "succeededforwards": {
         "cleaned": 0,
         "uncleaned": 4253056
      }
   }
}

BUT in the logs:

Dec 03 11:25:35 lightningd[2960040]: UNUSUAL plugin-autoclean: Killing plugin: Failed to parse JSON response '??{"jsonrpc":"2.0","method":"log","params":{"level":"error","message":"Failed to parse RPC JSON response '\n\n{\"jsonrpc\":\"2.0\",\"id\":null,\"error\":{\"code\":-32600,\"message\":\"Invalid token in json input: '\\n\\na4_{F06\\u0019?'\"}}\n\n'"}}??'
Dec 03 11:25:35 lightningd[2960040]: **BROKEN** plugin-autoclean: Plugin marked as important, shutting down lightningd!

Another try:

lcli autoclean-once succeededforwards 28512000
lightning-cli: reading response: socket closed

Logs:

Dec 03 11:27:19 lightningd[2960360]: UNUSUAL plugin-autoclean: Killing plugin: Failed to parse JSON response '??{"jsonrpc":"2.0","method":"log","params":{"level":"error","message":"Failed to parse RPC JSON response '\n\n{\"jsonrpc\":\"2.0\",\"id\":null,\"error\":{\"code\":-32600,\"message\":\"Invalid token in json input: '\\n\\nm??\\u0003.\\b d??'\"}}\n\n'"}}??'
Dec 03 11:27:19 lightningd[2960360]: **BROKEN** plugin-autoclean: Plugin marked as important, shutting down lightningd!

Oh and i use postgresql 14.6 with a cluster setup if thats important.

cdecker commented 1 year ago

My best guess here is that the batching is somehow interfering with things, as the JSON stream is getting corrupted somehow.

rustyrussell commented 1 year ago

Well, something weird is happening. That looks horribly like use-after-free; especially since that msg it's complaining about parsing is something that lightningd itself produces.

I'm running a 1M forward test under valgrind now, but it's going to take a while...

rustyrussell commented 1 year ago

OK, I found something! Now to reproduce in a meaningful way...