FirebaseExtended / firepad

Collaborative Text Editor Powered by Firebase
Other
3.86k stars 877 forks source link

Initialization and Load Time Considerations #212

Open isTravis opened 8 years ago

isTravis commented 8 years ago

Howdy! Amazing work - thank you so much for all of it. We've been using firepad for a few months now and it's been really great.

The one hiccup we do have concerns load times and the initialization process. Even the examples page has a pretty heavy load time for me https://firepad.io/examples/.

Apologies if I'm missing some documentation on this somewhere - I spent a bit looking around but couldn't find anyone else talking about it.

Thanks again!

mikelehen commented 8 years ago

Hey Travis,

Thanks for reaching out. Happy to hear that firepad has been (mostly) working out for you!

As for initialization / load time, I can give you some general info and we can go from there to hopefully isolate your particular issues.

In general, firepad does the following during initialization:

  1. Reads the latest recorded "checkpoint." This is a snapshot of the pad that should be no older than 100 revisions old.
  2. Reads the pad history since the checkpoint (less than 100 revisions).
  3. Replays the history on top of the checkpoint to get the very latest state. This should be pretty fast.
  4. Renders this up-to-date version.

In terms of what could make initial rendering slow, I'd keep an eye on a few things:

  1. The size of the resulting document. If the pad is many pages long, I could see there being some performance penalty.
  2. The amount of rich-text formatting. Firepad manually implements a bunch of rich-text logic on top of CodeMirror (the underlying editor, which is not inherently meant to be a rich-text editor). The implementation is okay, but not super optimized and I suspect you could run into issues with long documents with a lot of custom formatting.
  3. The size of the history. In theory this shouldn't be a big concern, due to the checkpoint, however with extremely long histories, the Firebase backend may actually incur some delay in performing the query to get the history items since the last checkpoint. A good clue that this is the issue is if it's slow to load an inactive pad, but refreshing the page is fast even in an incognito tab with no cache (it'll be fast the second time due to caching of the history data in the Firebase backend).

For what it's worth though, none of these should really apply to the examples page. And indeed I see it render all 3 pads in <1/2 a second for me. So if you're seeing that be slow, maybe this is more of a general connection speed issue or something. :-/

Regarding pre-rendering server-side, that seems tricky (especially from a UX perspective to make it not too jarring when you swap in the "real" firepad) and I'm not sure how much it'll actually help, since presumably the server has to do the same work the client would have done (in terms of loading the checkpoint, the history since the checkpoint, and replaying the history). So I don't know if it'll be a ton faster unless you're also doing some sort of caching. But that seems tricky too. :-/

Hope this provides some help. Feel free to ping me (michael@firebase.com) if you have a consistent repro of a "slow" pad and want help doing a deeper investigation.

isTravis commented 8 years ago

Thanks so much for the in-depth response!

Great to hear that it rebuilds from a local checkpoint. A couple follow up questions:

I came to somewhat the same conclusions regarding server-side rendering. Glad to know I wasn't missing anything obvious, but bummed to hear there isn't some magical solution ha. Will post back if I come up with something.

I'll follow up with you if I can replicate the slowness reliably. Thanks again for the help!

mikelehen commented 8 years ago

Answers:

  1. Oh, yeah. If you have richTextToolbar disabled, then you can ignore my comments about rich-text.
  2. Yes, if that large macbeth edit happened since the last checkpoint, it could definitely affect performance... In particular, you'll definitely incur the bandwidth cost to download that macbeth paste even though it's not visible. Once enough subsequent edits happen to trigger another checkpoint, that cost should disappear though.
  3. For the most part, browser caching shouldn't make much of a difference, but it could make a bit. In particular, the first time you connect to a given firebase, we may "redirect" you to the best backend server for that firebase. We cache that redirect in localstorage so it should only happen the first time. And of course, static resources like firebase.js and firepad.js, etc. will be cached by the browser. You can open an incognito tab or similar to compare the effects of no cache.
  4. Yeah, it's tricky. Personally, I'd probably use Firebase logging (see below).

You can turn on logging with timestamps with:

Firebase.enableLogging(function (msg) {
  console.log(new Date().toISOString() + ': ' + msg);
});

But that's probably going to be tricky to make sense of if you aren't a Firebase Developer. :-) But you may be able to tease out some useful insights (and get an idea of how long it takes for things to "settle.") Or you can ping me (michael@firebase.com) with your log output and I can take a peek.

cben commented 8 years ago

One thing I was thinking about is that it could be very convenient if checkpoints were "eventually" up-to-date. Obviously they can't be up to date on busy documents, but I'd like them to be (with high probability) on dormant docs.

Currently firepad checkpoints when rev % 100 == 0, right? The simplest heuristic would be also make a checkpoint when you made the last edit and there'd been no newer edits for several minutes. Won't help if users closes window quickly after finishing writing. Or maybe better when X minutes passed since last checkpoint (needs caution of clock skew). Another heuristic is checkpointing after load if it was stale. To be surer one would have have a server doing periodic sweeps of all docs, but I think the above would already be a good.

Benefits:

  1. It could allow faster on average (less round-trips) loads, both by clients and servers.
  2. Client-side, I could fire a REST request for just the checkpoint before I'd even loaded Firebase or CodeMirror! Yes, it would be a read-only unstyled "preview" and the transition would be a bit jarring but if it comes early enough I think it'd be preferable to looking at a blank page.

2015-11-21 0:45 GMT+02:00 Michael Lehenbauer notifications@github.com:

Answers:

  1. Oh, yeah. If you have richTextToolbar disabled, then you can ignore my comments about rich-text.
  2. Yes, if that large macbeth edit happened since the last checkpoint, it could definitely affect performance... In particular, you'll definitely incur the bandwidth cost to download that macbeth paste even though it's not visible. Once enough subsequent edits happen to trigger another checkpoint, that cost should disappear though.
  3. For the most part, browser caching shouldn't make much of a difference, but it could make a bit. In particular, the first time you connect to a given firebase, we may "redirect" you to the best backend server for that firebase. We cache that redirect in localstorage so it should only happen the first time. And of course, static resources like firebase.js and firepad.js, etc. will be cached by the browser. You can open an incognito tab or similar to compare the effects of no cache.
  4. Yeah, it's tricky. Personally, I'd probably use Firebase logging (see below).

You can turn on logging with timestamps with:

Firebase.enableLogging(function (msg) { console.log(new Date().toISOString() + ': ' + msg); });

But that's probably going to be tricky to make sense of if you aren't a Firebase Developer. :-) But you may be able to tease out some useful insights (and get an idea of how long it takes for things to "settle.") Or you can ping me (michael@firebase.com) with your log output and I can take a peek.

— Reply to this email directly or view it on GitHub https://github.com/firebase/firepad/issues/212#issuecomment-158547611.

vincentwoo commented 8 years ago

Another simple-ish optimization would be to always load, initially, the latest checkpoint AND last 100 edits. Right now the latter blocks on the former.

mikelehen commented 8 years ago

These both seem like good ideas worth exploration! A couple notes:

  1. @cben Yeah, checkpoints are currently made whenever rev % 100 == 0.
  2. As Vincent points out, we load the checkpoint and then load the history since the checkpoint. So even if the checkpoint is 100% up-to-date, we're still going to load the (empty) history before displaying the firepad contents (in general there's no way to tell whether the checkpoint is up-to-date), so there's a chance having an up-to-date checkpoint won't help much since we're still doing just as many round-trips.

    We could consider displaying the contents based on the checkpoint and applying the remaining history after the contents are already visible, but it seems like that could be a worse experience (both from UX and performance), if the checkpoint wasn't up-to-date. :-/
  3. If instead you did Vincent's approach, you'll reduce the # of round-trips but increase the bandwidth in the average case (since you're always grabbing a full 100 history items), which again may or may not pay off. :-)

So if we go forward with either approach (PR's welcome), I'd try pretty hard to measure the time savings to make sure it's really paying off. :-)

kofifus commented 8 years ago

I would like to make some more observation about performance as I am also struggling with poor loading times when using slower machines.

I did some profiling and here are some results:

2016-02-23T21:09:56.341Z: got checkpoint 2016-02-23T21:09:56.415Z: end checkpoint 2016-02-23T21:09:56.817Z: start handleInitialRevisions 2016-02-23T21:09:56.909Z: end handleInitialRevisions 2016-02-23T21:09:56.910Z: start applyOperationToCodeMirror 99 operations 2016-02-23T21:10:04.113Z: end1 applyOperationToCodeMirror 2016-02-23T21:10:04.262Z: end2 applyOperationToCodeMirror 2016-02-23T21:10:05.311Z: firepad ready

end1 is before rtcm.codeMirror.refresh(); in applyOperationToCodeMirror and end2 after

this is a small one page document with 99 revisions since the last checkpoint. what you can see is that the main loop in applyOperationToCodeMirror is what actually takes almost all the time here and not loading the checkpoint or revisions !

on further profiling, it is obvious all this time is taken in the rtcm.insertText calls. wrapping the internals of applyOperationToCodeMirror in a cm.operation() does not make a difference (prob since insertText now already does).

On a fast machine this time almost disappears so it seems that insertText is extremely cpu intensive.

I am looking for more pointers into this. Why is insertText so slow ? I'm pretty sure none of the 99 operations involves a lot of text.

Also can the 99 insertText be combined into a single one ? or is there any other way to improve performance ? this will significantly improve user experience on slower machines/tablets etc

Thx!

mikelehen commented 8 years ago

Sorry, I don't know offhand. The Chrome profiler is probably the best place to start to see where the time is being spent.

FWIW, it would be a strange coincidence if there were 99 revisions since the last checkpoint and also 99 insertText calls. Basically the 99 revisions will be combined with the checkpoint to give the current state of the document. This should be pretty fast. And then we'll do an insertText for each "rich-text span" in the document (adjacent spans of text with different rich-text formatting are separate "rich-text spans").

-Michael

On Tue, Feb 23, 2016 at 1:24 PM, Kofifus notifications@github.com wrote:

I would like to make some more observation about performance as I am also struggling with poor loading times when using slower machines.

I did some profiling and here are some results:

2016-02-23T21:09:56.341Z: got checkpoint 2016-02-23T21:09:56.415Z: end checkpoint 2016-02-23T21:09:56.817Z: start handleInitialRevisions 2016-02-23T21:09:56.909Z: end handleInitialRevisions 2016-02-23T21:09:56.910Z: start applyOperationToCodeMirror 99 operations 2016-02-23T21:10:04.113Z: end1 applyOperationToCodeMirror 2016-02-23T21:10:04.262Z: end2 applyOperationToCodeMirror 2016-02-23T21:10:05.311Z: firepad ready

end1 is before rtcm.codeMirror.refresh(); in applyOperationToCodeMirror and end2 after

this is a small one page document with 99 revisions since the last checkpoint. what you can see is that the main loop in applyOperationToCodeMirror is what actually takes almost all the time here and not loading the checkpoint or revisions !

on further profiling, it is obvious all this time is taken in the rtcm.insertText calls. wrapping the internals of applyOperationToCodeMirror in a cm.operation() does not make a difference (prob since insertText now already does).

On a fast machine this time almost disappears so it seems that insertText is extremely cpu intensive.

I am looking for more pointers into this. Why is insertText so slow ? I'm pretty sure none of the 99 operations involves a lot of text.

Also can the 99 insertText be combined into a single one ? or is there any other way to improve performance ? this will significantly improve user experience on slower machines/tablets etc

Thx!

— Reply to this email directly or view it on GitHub https://github.com/firebase/firepad/issues/212#issuecomment-187917194.

kofifus commented 8 years ago

one thing I'm seeing is that as the document builds, both RichTextCodeMirror.prototype.onCodeMirrorBeforeChange and RichTextCodeMirror.prototype.onAnnotationsChanged run for every revision since the last checkpoint.

I wonder if this can be optimized so that they can run once at the end ?