arj03 / ssb-browser-demo

A secure scuttlebutt client interface running in the browser
Other
58 stars 11 forks source link

Basic protection against concurrency issues #260

Closed KyleMaas closed 3 years ago

KyleMaas commented 3 years ago

Basic protection against concurrency issues. See #139. Uses a localStorage-based mutex which times out in case a tab dies on us. Adds a bunch of extra checks throughout the codebase to try to gracefully handle a locked SSB.

Also fixes:

KyleMaas commented 3 years ago

If you do make changes, there are a few important things about this:

  1. It is possible for a not-quite-fully-initialized SSB to be passed back to the caller. This pretty much always happens with the first getSSB() call that initializes the SSB object. This is why in most places I don't just check for whether the SSB object is returned, but also that the functionality it needs is included in it.
  2. For long-running processes (like Connections status polling) or anything with a callback, it is possible for the parent window to be closed between the time the callback was set up and when it runs. So the SSB object needs to be acquired anew (and error checking needs to happen) each time one of these runs.
  3. I know you wanted this done synchronously, but a lot of the places that need SSB could be vastly simplified with an asynchronous callback. As it is, there are a lot of setTimeout()s to just try again. I intentionally set this up so it was easy to convert over.
  4. The mutex system is localStorage-based. This is because it's very widely-supported across browsers, it uses the same browser API as the storage we're protecting in many cases, and per the localStorage spec browsers are required to implement mutexes in their implementations. That was tricky to get right, but there is a reason it delays startup by 2 seconds, and that's to check for pings from other tabs. Without that, it starts nearly immediately but does not properly protect against concurrent access.
arj03 commented 3 years ago

Looking at this sync stuff it's actually not that bad I think. Like if we convert:

        [ err, SSB ] = ssbSingleton.getSSB()
        if (!SSB || !SSB.net) {
          // This is async anyway - try again later.
          setTimeout(this.renderConnections, 3000)
          return
        }

into something like:

        SSB = ssbSingleton.getSSBOrTryAgain(this.renderConnections)
        if (!SSB) return

and:

function getSSBOrTryAgain(tryAgain) {
        [ err, SSB ] = ssbSingleton.getSSB()
        if (err || (!SSB || !SSB.db)) { // <- I guess db should be enough in all cases
          setTimeout(tryAgain, 500)
          return
        } else
          return SSB
}

That would make stuff a bit cleaner.

arj03 commented 3 years ago

I still need to properly test this, but wanted to leave my comment here, not going to create any changes to this, so go ahead and make that change if it looks good. I'll do some proper testing tomorrow.

KyleMaas commented 3 years ago

I like where you're going with that. It was a little different than what I was thinking, but I think it would work quite well. How about something like (writing this freehand and haven't tested it):

function getSSBEventually(timeout, isRelevantCB, ssbCheckCB, resultCB) {
  // If the caller no longer needs a result, return right away before processing anything.
  if (isRelevantCB && !isRelevantCB()) return;

  [ err, SSB ] = ssbSingleton.getSSB()

  // Do this here so that if we time out and return, SSB is set to null if it doesn't pass.
  // That way a simple if(SSB) is all it takes on the client end.
  SSB = (!err && ssbCheckCB(SSB) ? SSB : null)

  if (!SSB) {
    if (timeout != 0) {
      // Try again.
      setTimeout(function() {
        getSSBEventually((timeout > 0 ? Math.max(0, timeout - 500) : timeout), isRelevantCB, checkCB, resultCB)
      }, 500)
      return
    } else if (!err) {
      // We timed out but don't have an error, so we should set one before the callback below runs.
      err = "Could not lock database"
    }
  }
  resultCB(err, SSB)
}

The idea being that you would call it like this:

function renderSomethingOptionalButHelpful(err, SSB) {
  if (SSB) {
    // We know SSB fits what we need.
  } else {
    // We probably timed out, which means there is likely another window holding SSB.
    // But this is an optional feature, so we don't really care enough to keep retrying.
  }
}

// Pass 5000 so it times out with an error after 5 seconds.
ssbSingleton.getSSBEventually(5000, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.getProfile }, this.renderSomethingOptionalButHelpful)

Or this:

function postMessage(err, SSB) {
  if (SSB) {
    // We know SSB fits what we need.
  } else {
    // We timed out.  Tell the user why.
    alert("Couldn't post.  Reason: " + err)
  }
}

// Pass 3000 so it times out with an error after 3 seconds so we can complain to the user.
ssbSingleton.getSSBEventually(3000, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.db }, this.postMessage)

Or this:

function renderConnections(err, SSB) {
  // Do stuff with an SSB we know works because with an infinite timeout we are never called unless SSB fits.
}

// Pass -1 for a timeout so it keeps trying until it acquires a valid SSB.
ssbSingleton.getSSBEventually(-1, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.net && SSB.db }, this.renderConnections)

And by having an "is SSB suitable for my use" callback, you could also have your check callback do stuff like checking if the profile index was loaded, rather than the way we do now where I had to use setTimeout() in the case where the database was loaded but the profile index was unavailable.

By having an "is this still relevant" callback, we can make sure it stops retrying if the Vue component which needed it gets closed. That way if we can't acquire an SSB, we're not leaking memory by holding open references and Vue components which are destroyed can be properly garbage collected.

What would you think about something like that?

(Edited to shorten line lengths to make it a little easier to read.) (Edited again because of a missing parameter on the retry.) (Edited again because infinite retry couldn't have worked. I really should probably test this sometime and see if there's anything else I missed.)

arj03 commented 3 years ago

Yeah that sort of makes sense. Just need to be careful not to convert everything to callbacks, which is what I liked about the getSSBOrTryAgain proposal and the way you did it currently.

arj03 commented 3 years ago

Another thing to consider is that this feels like it should live in ssb-browser-core. But lets just make it work here first and then we can port it over.

KyleMaas commented 3 years ago

Another thing to consider is that this feels like it should live in ssb-browser-core. But lets just make it work here first and then we can port it over.

Yeah, I was kind of thinking this, too. Part of the reason I tried to separate ssb-singleton out into its own module which doesn't rely on anything in ssb-browser-demo.

KyleMaas commented 3 years ago

I'm going to be afk for a while, so I figured I'd commit this so you can work on it if you want. This latest commit should still work. My changes to Profile are what are triggering a bunch of race conditions between different initialization steps of ssb-browser-core.

KyleMaas commented 3 years ago

Point being I held back the non-working Profile changes so I can debug those. I'd rather have this stable enough you can work on if you want.

KyleMaas commented 3 years ago

Turns out it was less of a race condition and more of a scope problem. If the async function was called, its use of SSB would override the global SSB object in use by ssb-browser-core and would set it to null. We still need to fix the problem of ssb-browser-core using an SSB global, but this at least works around that in the meantime.

KyleMaas commented 3 years ago

There. I like that a lot better. Made me a little nervous having setTimeout()s all over the place referencing all kinds of stuff which would prevent proper garbage collection and would keep running code in the background for stuff we no longer cared about.

This works pretty well for me now, even in a second tab opened via center click. Things it doesn't do:

  1. ~Catch right-click properly to prevent people from right-clicking a link and opening in a new tab. As it is right now, if a user did that, the second tab won't have an opener and will just footgun to prevent problems.~ (Done)
  2. ~Center-clicking to open new tabs works. Center-clicking from those tabs works. Closing the windows sequentially works. But closing a window in the middle of that sequence will footgun windows opened from that one due to them only crawling upwards for openers. Similarly, if you open two windows from a parent and then close the parent, the second child window will footgun because it was opened later than the first child window and will yield the mutex to it. I'd like to have a shared array of references to other open windows so you could open and close them in any order without any of them footgunning. But that can happen later.~ (Done)

That said, even without those two things, this is pretty functional. What do you think?

arj03 commented 3 years ago

I'll review the code now. Will do a little refactor as well

arj03 commented 3 years ago

Been reading the diff. Looks good. I only had those 2 comments. I pushed up a minor refactor to make the api a bit easier to use. I'm one of those people that use right click open more than middle click so would be really nice to have that fixed as well. I have been testing this and it seems to be working quite well with the locking etc.

arj03 commented 3 years ago

Also I should probably say, this is really nice and quite a tricky one to get right, so great work so far! :)

KyleMaas commented 3 years ago

Thanks! Concurrency's a fascinating subject and exceedingly easy to get wrong. I'm more used to doing multiprocessing/multithreading in environments where you can have shared memory (either in the same process or system-wide), actual mutexes and semaphores, filesystem locking, critical sections, PID files, communications sockets that all your target platforms support (unlike such things as Broadcast Channels), etc. And I'm more used to doing it in projects where it was designed for it early on - client/server protocols for communicating with a daemon which holds the locks, for example. Doing this kind of thing entirely on the client side is a bit of an adventure for me.

KyleMaas commented 3 years ago

I think I'm actually pretty happy with that.

arj03 commented 3 years ago

Nice, will give it a test :-)

arj03 commented 3 years ago

If I control click on a link it doesn't work. Is that correct? I tried opening a thread using that. It works if I right click and say open in new tab.

KyleMaas commented 3 years ago

Having used a mouse with a center button for years, I didn't know that was a thing. But I should be able to catch that.

KyleMaas commented 3 years ago

See how that works for 'ya.

arj03 commented 3 years ago

That works. I think this is ready to go. I'll just remove the console.log. Do you have anything else you think we should do first?

KyleMaas commented 3 years ago

(Incidentally, the name of this pull request is no longer accurate. This goes way further than I was intending for the initial concurrency stuff.)

KyleMaas commented 3 years ago

Nope. I'm good. Go for it.

arj03 commented 3 years ago

Basic protection :-)