arj03 / ssb-browser-demo

A secure scuttlebutt client interface running in the browser
Other
58 stars 11 forks source link

getOOO doesn't seem to work #283

Closed KyleMaas closed 3 years ago

KyleMaas commented 3 years ago

For several weeks now, I've run into situations where a message is outside of my follow graph. I'll click the "Get Msg" button, and after a while it just times out and logs an error. I can never seem to actually get missing messages. That used to work, so I'm not sure when it broke, but it's been consistently failing for a while. I was hoping to have time to actually debug it myself. But I haven't had the time lately, so I figure I probably ought to at least report it. Then if either of us have time we can try to figure it out.

arj03 commented 3 years ago

If you get an error that means none of the peers you are connected to is able to get the message. I was testing this today on a thread and found that if it does get the message, then it didn't properly re-render so I fixed that in master.

KyleMaas commented 3 years ago

Cool. If there is an error, we should probably catch it and display it to the user in a more obvious way than just logging it to the console.

KyleMaas commented 3 years ago

Revisiting this, I can confirm that "Get msg" does work now if messages are replicable. But so far I haven't found one that is too far outside of the follow graph to get, so I haven't been able to trace down the error condition where it wouldn't render. Any pointers on where to look for this, since clearly I didn't take good enough notes in this bug on how that operates?

arj03 commented 3 years ago

All of the ooo handling should come from this module. It's not super easy to follow, basically it floods the network for someone that has the message.

KyleMaas commented 3 years ago

The only way I'm seeing for get() to be called and not run the callback with an error condition is if either:

  1. cb is undefined
  2. opts.timeout <= 0. Although it looks like if that's the case, the clearTimeout() call would likely also fail.

As far as I can tell, the UI side always shows an error if one is passed back. So maybe where this is called from triggers one of these conditions?

arj03 commented 3 years ago

Yeah, try checking where this is called from

KyleMaas commented 3 years ago

This:

https://github.com/arj03/ssb-browser-demo/blob/48c16486ba8f0832158837add0aa77db5ac2e0d6/ui/ssb-msg.js#L201

Calls this:

https://github.com/arj03/ssb-browser-demo/blob/48c16486ba8f0832158837add0aa77db5ac2e0d6/net.js#L170

Which calls this:

https://github.com/arj03/ssb-browser-core/blob/6d001542ce78564b561135ef5694090016f0736f/simple-ooo.js#L55

It sure doesn't look like it'd hit either of those conditions. Maybe conf.timeout <= 0?

Hm...that could probably also happen if conf.timeout is undefined, since it wouldn't be null.

KyleMaas commented 3 years ago

Nevermind. conf.timeout is unspecified at that point, and undefined == null so it does get a default of 5000.

However, I did notice that in the event of a timeout (which clears cb) followed by an error on the query, this could cause an error since it doesn't check that cb is still valid before trying to call it:

https://github.com/arj03/ssb-browser-core/blob/6d001542ce78564b561135ef5694090016f0736f/simple-ooo.js#L69

Doesn't explain this problem, but it is a problem we probably ought to avoid.

KyleMaas commented 3 years ago

Wonder if maybe this got fixed as a side effect of the multi-tab coordination stuff. Because in thinking about how to trigger this, I realized that if I just put it into offline mode I should be able to test it. But if I do that, I do get an error. It's not very descriptive since it just says it can't lock the database, but it at least does something. So...this bug might be unintentionally fixed now.

arj03 commented 3 years ago

Alright, how about we close this for now and reopen if it happens again?

KyleMaas commented 3 years ago

Works for me.