ReviewNB / support

Issues and feature requests for ReviewNB
https://reviewnb.com
59 stars 8 forks source link

Show content earlier: load diff of each file & cell asynchronously #38

Closed juhoautio closed 5 years ago

juhoautio commented 5 years ago

Feature request: ReviewNB to asynchronously process the diff for each file in the commit (or PR).

For big commits with multiple files it may take rather long that ReviewNB displays just the loading icon. This happens Especially if there are many notebook files that changed.

ReviewNB could display the comments & list of files ~immediately, and then continue loading the diff for each file asynchronously. This way users could start looking at the changes on the top of file list already before rest of the diff has been processed.

amit1rrr commented 5 years ago

Thanks for reporting & suggesting a solution @juhoautio. If you could share following data on the impacted PR/Commit that would be great,

juhoautio commented 5 years ago

Thanks. It seems I had the wrong idea about this being about the number of notebooks, sorry. The latency seems to mostly depend on the total amount of content.

In general it seems that for almost every PR we have to wait ~10s.

Below are times measured loading 3 real (internal) PRs. I loaded each of them twice to account for random fluctuation. Especially the content download may be affected by my connection speed. Which actually highlights that loading in chunks would improve the experience especially on a slower connection.

Case one "868"

Case "867"

Case "883"

It would be nice to display at least the discussion, commits & changed file list instantly. Then the actual notebook diffs could be loaded async? Instead of displaying the loading animation on the whole PR page, it could be shown more specifically on parts of the diff that haven't been loaded yet. Ultimately it would be possible to load the diff for each cell separately. That way user would almost never have to wait, because all content gets to be loaded before scrolls down or expands more files in the diff.

But maybe you also have some example PRs with real notebook content? Are our notebooks heavier than usual (wrt. "Total data transferred")?

This ~10 seconds delay is something that can be lived with, but clearly it could be improved. I wonder what kind of latencies ReviewNB users are experiencing in general.

amit1rrr commented 5 years ago

@juhoautio Thank you for the stats. Yes, the latency is proportional to notebook size (+ the size of change). Number of notebooks do not matter as much since diff for all files are processed in parallel.

Time taken to compute diff is relatively tiny. Maximum time is spent in

  1. Fetching file contents from GitHub,
  2. Serving the diff to the client browser.

Both these are network bound so I don't have many ideas on how to optimize it further (we are already doing simple things like gzip'ing the response).

It would be nice to display at least the discussion, commits & changed file list instantly. Then the actual notebook diffs could be loaded async?

All review comments have notebook cells (context) assigned to them and fetching those cells requires fetching entire notebook content (slow operation). Maybe we could show the comments quickly and then send the context cell whenever available. It would 'feel' zippier but I don't know how useful that is (seeing conversation thread without the actual notebook cell being talked about).

But maybe you also have some example PRs with real notebook content? Are our notebooks heavier than usual (wrt. "Total data transferred")?

In my testing, I see most notebooks (~80%) to be small to moderate (single digit MB's). And then remaining 20% are large to huge. I occasionally see some PRs/Commits crossing 30 second for ReviewNB page to load. Usually 80% of that latency is content download.

One option is to cache some responses so subsequent calls are quicker. But still the first load would be slow and it goes against our promise that we don't store any of your GitHub data.

Couple of things I'd recommend you check with your on-prem setup is,

juhoautio commented 5 years ago

Thanks for the detailed response.

I didn’t come to think about rendering of comment context, but what you said would make sense to me. I suppose it would also be possible to prioritize what to diff & return first depending on which cells have comments bound to them (determine by file name & row).

I agree that caching wouldn’t be an option.

I’m closing this issue because maybe I thought that reducing the latency would be more of a low-hanging fruit than it is.