ZenGround0 / onramp-contracts

Prototyping smart contracts piping data to the filecoin network
2 stars 5 forks source link

Figure out log notify timeout bug #7

Open ZenGround0 opened 4 months ago

ZenGround0 commented 4 months ago

Bad bug timing out every 90s from subscription channel when connected over lotus ws:// api. Very generic golang network connection timeout -- seems unrelated to lotus or xchain. Looks like the socket is getting disconnected at lower level?

Could be ubuntu OS No good knowledge in slack threads.

Strategies

  1. try on macos
  2. run boost see if it has same problem as it also does this subscription
  3. Try running lotus not on localhost but on some other server
  4. see if this happens on non lotus eth apis
  5. all else fails debug deep into network stack
ZenGround0 commented 4 months ago

Timeout takes exactly 60s on mac OSX Timeout appears to be related to Lotus being my JSON RPC provider. When I run against ganache I don't get timed out.

ZenGround0 commented 4 months ago

The nice thing is that before the timeout strikes events can be heard:

2024/06/06 17:40:37 Listening for data ready events on 0x2ae35f89e0b6DBd6cF804E20953a7D7B7ff2d9F7
Log Data: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 160 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 27 231 210 24 207 84 118 212 79 169 151 106 176 230 65 254 163 204 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 1 129 226 3 146 32 32 145 57 111 148 131 86 171 195 64 153 208 53 9 222 161 20 98 136 34 199 22 163 194 255 84 34 146 54 35 57 171 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 102 97 107 101 98 117 102 46 99 111 109 47 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Event Parsed: [{[1 129 226 3 146 32 32 145 57 111 148 131 86 171 195 64 153 208 53 9 222 161 20 98 136 34 199 22 163 194 255 84 34 146 54 35 57 171 8] 576000 fakebuf.com/44444444444444444444444 0 0x191be7d218cf5476D44fA9976ab0E641Fea3cC22} 4]

This doesn't keep the timeout from happening afterwards unfortunately

ZenGround0 commented 4 months ago

We've now got a basic workaround -- just resubscribe every minute once the subscription shuts down. This is a bit racy so a better way to do this is to record the chain epoch of subscription shut down and then query the logs directly from that epoch to now when starting up again.

However I'm really hoping that these issues don't show up at all in real networks since ganashe didn't have it. I'm suspicious that calibration network nodes won't even have it.