Add experimental reporting for large projects via paging

danmayer commented 6 months ago

This should address issue #484 and allow for much larger web reports to be displayed even when on resources that limit request time to 30s per request. This change adds a paged json report and a poll method that will load the report page by page until the full report is available.

because of the way we page we can't detect and show all files that have no eager loading or runtime coverage... meaning if a file exists on your filesystem but is never loaded or referenced it won't show up in this view.
release a RC candidate to have some folks test whom have struggled to load larger projects

TODO: before main release

[x] add UI indication of loading
[x] remove console.log statements
[x] possibly remove all of the previous suggestion / attempt around a caching layer (do this in a different PR after confirming everything is working well)
[x] refactor duplication across hash redis
[x] remove web pager reporter and just make it an option for the standard one
[x] refactor the JS to use same colorbox / click logic for paging and not
[x] support link styles and extra data for paging links (the color highlighting of of the file rows is not yet implemented so they all show as grey, but should show real data)
[x] fix deep links direct to code via file hash
[x] fix that eager loading coverage data is not fully accurate, the runtime data should be
[x] back button / url hashes might get out of sync
[X] fix and remove all the todo / comments in the changes
[x] fix bug where clicking a loaded file triggers two ajax requests for the file data
[x] tighten up test coverage (in progress)

Instructions to use:

install the RC release: gem "coverband", "6.0.3.rc.1"
ensure your coverband.rb is using the hash redis store: config.store = Coverband::Adapters::HashRedisStore.new(Redis.new(url: ENV['REDIS_URL'] || 'redis://localhost:6379/0'))
in routes use the new paging reporter: mount Coverband::Reporters::WebPager.new, at: '/coverage'
load the /coverage endpoint if you want to see the paging loading you can look at the dev tools console log or network tab to see it paging in the data

danmayer commented 6 months ago

you can see things setup on this example project but it uses ENV vars to toggle between various options https://github.com/danmayer/coverband_rails_example/blob/main/config/coverband.rb

frsantos commented 6 months ago

@danmayer I tested it on local, but it didn't work, as we mount the route in a subdirectory:

mount Coverband::Reporters::WebPager.new, at: '/config/coverage'

It tries to download http://localhost:3000/coverage/report_json?page=1, which returns a 404 (Not Found)

Although when I moved the route to /coverage it worked properly.

When it works on custom routes, I'll need to test it on staging that has way more data.

danmayer commented 6 months ago

ahh ok thanks @frsantos I can fix the bug that should work when mounted at any arbitrary URL. Let me know how the staging tests go

danmayer commented 6 months ago

ok @frsantos the release 6.0.3.rc.2 has a fix for the relative mount path

frsantos commented 6 months ago

Hi @danmayer . It loads now, and I can see it paginating some thousands of files. But the rows are not clickable nor they have any colour for their coverage.

danmayer commented 6 months ago

as part of this change I bumped the storage version key.... You might need to run and collect data again it will start empty.

If you view some of the paged data via the network browser does it show pages coming in with coverage data?

danmayer commented 6 months ago

ah never mind, I see the problem when I switch to the most recent paging technique that fixed batch loading I broke all the links... I will fix soon.

danmayer commented 6 months ago

ok, I fixed the linking and file loading, but found another bug in the way it loads eager and runtime coverage, which will take a bit more time to fix.

SyntaxRules-avant commented 6 months ago

I attempted to try this out, but I could not because ruby 2.7+ is required. I'm embarrassed to say the large project I'm working on is still 2.6.

danmayer commented 6 months ago

sorry @SyntaxRules-avant we recently dropped support for 2.6, I kind of doubt I will bring that back... I know that many folks using coverband are on older legacy systems so we do try to allow for a very long deprecation cycle but it does get hard to support newer capabilities without eventually dropping older versions.

OK @frsantos the branch is pretty hacky at the moment, but I want to make sure it solves the problem before I make this more solid...

deep links don't work (linking directly to a single source code file coverage)
the eager loading coverage data is not fully accurate, the runtime data should be
the color highlighting of of the file rows is not yet implemented so they all show as grey, but should show real data
back button / url hashes might get out of sync
I broke some tests

All that being said, I think this should work and show that it can load your data much faster and many of the improvements will carry over even to the none paging version of the coverage report, so it should be a nice win when I finish cleaning it up.

Let me know if you can give it a try and gather some data to load the large project.

SyntaxRules-avant commented 6 months ago

@danmayer Its totally understandable that you upped the requirement to 2.7. I think I found a way forward for an upgrade this week. Hopefully I'll be back to this issue in a couple of weeks.

frsantos commented 6 months ago

Thanks @danmayer, it looks promising :rocket:

One small thing I noticed: when clicking on a file, the request for recovering the data is performed twice.

danmayer commented 6 months ago

cool, yeah the current JS is a bit of a hack as I wanted to make sure it would load on projects of your size before fully investing in this approach. It sounds like it was able to load much faster for you, I will try to clean this all up and work on a real release soon.

danmayer commented 5 months ago

@frsantos I think this might finally be wrapped up, before I merge it to main and get a full release do you want to try out the RC and give any other feedback? coverband (6.0.3.rc.4) .

Things should all page in reasonably quickly and no calls should take to long individually. Things like deep linking, filtering, sorting, cell styling should all be working again. If you see anything odd or not working let me know.

There is still no summary data, as the server can't calculate that without the full data set. It also can't list files that are never loaded (IE on the filesystem but never loaded into the Ruby VM).

danmayer commented 5 months ago

As I add in paging support, I am considering backing out your caching background layer to simplify things and put more effort into this paging approach.

@Drowze, you submitted that work. Would you be willing to try this RC release and see if it works well for your project? I think for long term support and growth it might be a better approach than trying to just cache everything in the background. If you do see issues on your project let me know as I think there are a few other improvements I could make... I just don't have any real world project that seems to have issues with the non-paging loading.

Drowze commented 5 months ago

Hey @danmayer! Thank you for the great work and quoting me here! I'm more than happy test it! 😄

I just tried it with the project I work on - to do that I simply needed to add config.paged_reporting = true. I also turned off get_coverage_cache by the way (not sure if needed).

For this trial, I simply ran RSpec with Coverband enabled on a few tests, then tried to load the report - and here's my first impressions:

this new RC raises an error on boot for me:

/Users/Drowze/.gem/ruby/3.2.3/gems/coverband-6.0.3.rc.4/lib/coverband/configuration.rb:183:in `store=': uninitialized constant Coverband::Adapters::WebServiceStore (NameError)

      raise "invalid configuration: coverband service shouldn't have redis url set" if ENV["COVERBAND_REDIS_URL"] && store.instance_of?(::Coverband::Adapters::WebServiceStore)

I didn't go deep to investigate the error, but once I unset my COVERBAND_REDIS_URL environment variable, the error was gone. Here's my relevant lines in the coverband config:

  redis = Redis.new(
    url: ENV.fetch("COVERBAND_REDIS_URL", "redis://localhost:6379/2"),
    db: ENV.fetch("COVERBAND_REDIS_DB", nil)
  )

  config.store = Coverband::Adapters::HashRedisStore.new(
    redis,
    redis_namespace: "backend_coverband_data",
  )

in this attempt, Coverband needed to make 15 requests to get all the files. I noticed my browser getting slow and less responsive as the further requests finished (and also getting a bit more time in between the requests). After the requests finished my browser got back to normal though (note: I'm on 32GB macbook). I guess that dynamically updating the DOM on a page that big is pretty taxing on the browser, but it wasn't a particularly annoying problem in my experiment (I can just sit back and wait for a minute or two). Here are the logs of the requests:

2024-04-12 16:56:05.748771 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage", :ip => "::1" }
2024-04-12 16:56:06.093269 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/dependencies.js", :ip => "::1" }
2024-04-12 16:56:06.094586 D [85291:puma srv tp 003] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/application.js", :ip => "::1" }
2024-04-12 16:56:06.095241 D [85291:puma srv tp 004] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/loading.gif", :ip => "::1" }
2024-04-12 16:56:06.097000 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/application.css", :ip => "::1" }
2024-04-12 16:56:06.221549 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/loading.gif", :ip => "::1" }
2024-04-12 16:56:06.255212 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/images/ui-bg_highlight-soft_75_cccccc_1x100.png", :ip => "::1" }
2024-04-12 16:56:06.259959 D [85291:puma srv tp 003] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/images/ui-bg_glass_75_e6e6e6_1x400.png", :ip => "::1" }
2024-04-12 16:56:06.262267 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/images/ui-icons_888888_256x240.png", :ip => "::1" }
2024-04-12 16:56:07.493031 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=1", :ip => "::1" }
2024-04-12 16:56:08.056829 D [85291:puma srv tp 003] Rack -- Started -- { :method => "GET", :path => "/debug/coverage/magnify.png", :ip => "::1" }
2024-04-12 16:56:08.252130 D [85291:puma srv tp 003] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=2", :ip => "::1" }
2024-04-12 16:56:09.250307 D [85291:puma srv tp 004] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=3", :ip => "::1" }
2024-04-12 16:56:10.264717 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=4", :ip => "::1" }
2024-04-12 16:56:11.554291 D [85291:puma srv tp 003] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=5", :ip => "::1" }
2024-04-12 16:56:12.947891 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=6", :ip => "::1" }
2024-04-12 16:56:14.448568 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=7", :ip => "::1" }
2024-04-12 16:56:16.161091 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=8", :ip => "::1" }
2024-04-12 16:56:18.115500 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=9", :ip => "::1" }
2024-04-12 16:56:20.066511 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=10", :ip => "::1" }
2024-04-12 16:56:22.340135 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=11", :ip => "::1" }
2024-04-12 16:56:24.933990 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=12", :ip => "::1" }
2024-04-12 16:56:27.693214 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=13", :ip => "::1" }
2024-04-12 16:56:30.663193 D [85291:puma srv tp 001] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=14", :ip => "::1" }
2024-04-12 16:56:34.036570 D [85291:puma srv tp 002] Rack -- Started -- { :method => "GET", :path => "/debug/coverage//report_json?page=15", :ip => "::1" }

It wasn't a particularly great UX to have the list of files reordered everytime a request finished (i.e. what I was reading on the screen would constantly change as the requests progressively finished). Some solutions for that I guess would be:
- leave reordering to happen only after the last request (so new files are only appended instead of reordering everytime)
- return the results already pre-sorted somehow (though if the user manually changed the page order before the requests finish, this problem would happen again...)
I noticed only the JSON responses are paginated - should I expect pagination on the web UI as well? I'd love to see that! I usually won't even look to those files that already have a green coverage

For a more real test: after Wednesday I think I'll be able to try this change in our company testing environment - then I'll try to leave it there for a couple days and see how the coverage page performs. I expect to post my results here by the end of next week - would you mind waiting for that before removing the caching layer?

But that's it for now. It feels like a big improvement, thanks @danmayer! 😄

danmayer commented 4 months ago

Thanks so much for all the feedback this was really helpful.

@Drowze on the first issue I am confused, it shouldn't be trying to use the coverband service adapter unless you explicitly selected that adapter or you have a service config file:

SERVICE_CONFIG = "./config/coverband_service.rb"
The coverband service was an alpha of a possible server to collect coverage that wasn't based on off the shelf servers like redis and memcached

I think it was checking the constant that shouldn't exist at all if one wasn't configuring... Anyways, I think I fixed that issue but I am confused how it would just now occur... As those code paths should have existed for awhile.

OK for the slow page, unresponsive UI, etc I think I am going to let it do all the paging behind the scenes then render the data when it is all done. This means the pages don't stall out by changing the dom as often.

The UI in theory can do paging but the filters and sorts then only work per page which makes it much harder to actually use those to find what you might be looking for.

I was able to cut another redis call off the paged load, but I have about 700ms for 250 files of data per page. I will clean this up a bit more, but I think it might be good enough to ship now.

Drowze commented 4 months ago

After having it collecting coverage data in our internal testing environment (we're using this fork which is just this branch when I last commented on it + a fix for the first issue I pointed), it seems to me that this change is fairly good and I think we'll adopt to it soon 😄

One very strange thing happened though: every report_json?page request seem to be taking almost 10 seconds to finish - I'm not sure yet on what could be the cause, but does not seem related to redis (as we do have Redis commands being traced in our backend). Here's a sample of our Datadog traces:

Screenshot 2024-04-22 at 17 18 59

Screenshot 2024-04-22 at 17 19 24

The sequence of events traced by our Datadog tracer is something like:

SMEMBERS (runtime)
HGETALL (runtime)
SMEMBERS (eager)
... long wait ... (we don't have any traces here)
HGETALL (eager)
SMEMBERS (runtime) (and this same pattern repeats on all paged requests)

I'd have to get a snapshot of the redis database to further debug it locally but that's what I have for now. Hopefully this piece of data can give you a clue on what could be causing this slow down 🤔

danmayer commented 4 months ago

odd on the extra time... not sure there each page on my machine was about 700ms. I want to add some tracking in my tests to ensure I never increase the number of redis queries per number of files more than expected, so I will try to get that added into the tests and see if there are any other optimizations I can make.

danmayer commented 4 months ago

The main release does have a fix that you pathed on your branch so you should be good to try out 6.1.0

danmayer commented 4 months ago

I know what is going on with that long wait time @Drowze I am working on a fix that should speed things up a good deal.

danmayer commented 4 months ago

ok @Drowze I just released 6.1.1 this fixes that slowdown issue... It has my local test app with 6500 files loading the paging data down from 700ms to 40ms per page. It all loads super snappy now

Drowze commented 4 months ago

Just tested the latest 6.1.1 and it is fantastically fast now @danmayer! Thanks a lot for the hard work on it! :D

Found one issue, but I'm loving it already. Here's my feedback for it:

The good:

I loved that the full page is only loaded when all requests finish - and there's some visual feedback to show the user that the page is loading asynchronously. Awesome 👏
The paged requests are all super snappy now! All requests are taking an average of 1.5s (which is super acceptable!). Here's the new trace reported to our Datadog:

The bad:

In our codebase (4962 files reported to Coverband), we needed 21 paged requests to fetch all the files - but still we had 49 requests (and all responses from page 22 to page 49 had the same seemingly empty response). To help with debugging this issue, I'm attaching here:
- the trace of a sample request between page 22 and page 49 (they all follow the same pattern)
- the timeline of requests (as reported to Datadog)
- the timeline of requests (as reported by my browser's dev tools)
- the response of a sample request between page 22 and page 49 (they're all the same)

Click to expand the attached screenshots

* The trace of a sample request between page 22 and page 49 (note there's a single redis call: `SMEMBERS` (runtime) ![Screenshot 2024-04-23 at 13 40 01](https://github.com/danmayer/coverband/assets/9031589/d4c1a6b5-4b89-4775-9c9e-968881d0f6b5) * Timeline of requests (as reported to Datadog) ![Screenshot 2024-04-23 at 13 32 47](https://github.com/danmayer/coverband/assets/9031589/32b398f8-9353-4e8f-b21a-b83337018c63) * Timeline of requests along with a a sample response of a request between page 22 and page 49 (as reported by my browser's dev tools) ![Screenshot 2024-04-23 at 13 43 51](https://github.com/danmayer/coverband/assets/9031589/acef6688-18df-4d91-b3f3-98de4a637def)

danmayer commented 4 months ago

interesting... definitely a bug and it should get faster once I sort out what is going on. I have some ideas of where the total file count and the number of files return could get out of sync

danmayer commented 4 months ago

blah... I think I know what to look at / fix but me and my family all just came down with something so it will likely be a few days until I get to it.

danmayer commented 4 months ago

ok, I found a few reasons the static count and the full coverage array can be different or off by a few... for example if the hash of a file has changed but we have yet to record any coverage for the new version of the file... Anyways, I found a good halting condition so it will only ever over fetch by one page which lets the front end know it has pulled all the data.

so try out 6.1.2.rc.1 and let me know if that fixes the issue for you @Drowze

Drowze commented 4 months ago

Hey @danmayer! Thanks again for the hard work put on this gem! 😄

I just tried the new 6.1.2.rc.1 locally^_[1] and I can confirm that the problem seem to be gone :) I see:

22 paged requests' responses that contain files data with them (i.e.: total_files > 0 and aaData is not empty)
- 1st to 21st responses' total_files is around ~230
- 22nd response's total_files is 33
there is a 23rd response that is empty (i.e.: total_files == 0 and aaData is empty) - but not requests after that. I suppose that the frontend stops after receiving an empty response - so this empty response may be expected.

Running it locally (i.e.: with negligible Redis server latency) is impressively fast! I haven't talked about this before, but the memory usage now feels much much more efficient - my browser doesn't choke at all trying to load this big page anymore (something that used to happen quite a bit before these changes... Also for curiosity, this is a problem present in the simplecov project as well, since it loads all the files on the first load instead of doing some ajax request when clicking a file).

By the way, with this new paging, I am now fully happy to ditch away the caching logic I implemented earlier - that will not be missed 😄

Two minor questions now:

Is there currently any way to configure the page size (which I see is 250 by default)? I think that 250 is a fine default, but perhaps in a case-by-case users may be able to tune it depending on the latency to their redis server (and how fast the server really is to respond the HGETALL queries). I see the hash_redis_store seem to support page_size configuration, but I'm not sure how to configure that.
The snapshot I worked with seemed to retrieve 4876 files - but I see one the responses that iTotalDisplayRecords is actually 5289. What's up with the difference between the two? Does that probably means that 413 files did not receive any hit?
- Also that raises another question: is there any supported way (doesn't have to be with the HTML page - could be a rake task or using Coverband API directly) to retrieve files that did not have any hit - i.e. 0% coverage?

[1]: I tested it locally by downloading a redis snapshot from our testing environment and importing it locally.

danmayer commented 4 months ago

yeah I could expose a configurable page size. I just tried a few options and decided on it as a good default. I think for now I will keep it private just to keep things simpler.

They difference would be for files we are aware of but have no runtime coverage, or for files that have changed but have yet to receive coverage on the latest version of the file. This was one of the trade offs of going with the paging that I didn't come up with a good way to handle the never hit files as we page through all the files that do have coverage...

I could likely think of something to cover that case it might need to be something like a one off endpoint that can calculate just that... or it builds up the cache over paging and sends a final page of data which is the uncovered files.

danmayer commented 4 months ago

thanks again so much for your testing. I will cut this release, and at some point clear up the caching...

Yeah much of the original view layer in coverband came from simplecov but the way things work has changed fairly significantly under the hood for awhile... I am not sure how much work it would take to get this ported into simplecov, but I am guessing it would be pretty significant.

danmayer / coverband

Add experimental reporting for large projects via paging #516