Enhancement Request: Speed up coverage loading

SyntaxRules-avant commented 1 year ago

Is your feature request related to a problem? Please describe. I'm trying coverband on a very large application. To load the coverage page takes 2-3 mins. This is a little frustrating, it would be better faster.

Some hosting providers, like heroku, limit any one request to less than 30 seconds. When my app is loaded on infrastructure like that the coverage page is unusable due to timeouts.

Describe the solution you'd like Possible ways to speed up the page load:

For large applications with many lines and/or files these requests should be split up into multiple requests. Each of these requests should be less than 30 seconds to avoid timeouts like mentioned above.

Describe alternatives you've considered

Using https://github.com/danmayer/coverband/issues/442, I could avoid having to use the web interface all-together.
If static html pages were re-introduced I could generate coverage once (outside of a request) and viewing the static pages would be quicker.
Generate coverage on the CLI, display generate reports on webpage
Would excluding half of the files make it faster? If so, I could exclude half of files, run it for a while and then try the other half of files. Its not ideal- but it is an alternative!

Context

I have enabled config.use_oneshot_lines_coverage = true in an effort to make things faster. I am using Ruby 2.6.6.

zenspider commented 11 months ago

@SyntaxRules-avant how large is large for you?

danmayer commented 11 months ago

thanks for bumping @zenspider I lost track of this one...

I have seen on some very large monoliths the coverage page slow down to load, but since we weren't on heroku or with a forced timeout I haven't worried to much about it...

I am guessing how we load the coverage data is a N+1 redis hit for the admin view... I would likely need to break it into a process to page through the data and load it in smaller batches... This would take a bit of a refactor but would solve the issue.

In this case since it is the admin reporting view config.use_oneshot_lines_coverage = true will have no impact on the admin performance.

SyntaxRules-avant commented 11 months ago

@zenspider large in this case is ~2 million lines of code. Hitting the coverband webpage takes ~3mins.

@danmayer I like the approach you have described. Decoupling the initial page load and loading in the data in smaller batches would do it.

danmayer commented 11 months ago

@SyntaxRules-avant do you have an estimate on the number of files. The N+1 is more per file than per LOC.

Have you seen code files you do not expect to see such as specs or libraries like vendor gems? Neither of which should track or display in the runtime code coverage, but I have seen issues where they are included due to odd packaging and deployment configurations.

frsantos commented 11 months ago

It is very slow for me too, and at some point it gets so slow that it times out, and we need to reset the stats.

In my case we are talking about "4491 files in total. 116074 relevant lines". The views page times out, so I don't know how many.

danmayer commented 11 months ago

OK, so 4491 files isn't that many... but I could see things timing out in 30s especially on a slower redis...

I did verify that this is currently and N+1 lookup as the core of getting the coverage data is

files_set = files_set(local_type)
@redis.pipelined { |pipeline|
          files_set.each do |key|
            pipeline.hgetall(key)
          end
        }

so I will need to re-work a number of things to allow it to page through all the data to display the results and to calculate the final coverage percentage stats across the batches.

SyntaxRules-avant commented 11 months ago

@danmayer Thanks for the extra details. The app I am running is "13,070 files in total. 446,533 relevant lines."

danmayer commented 9 months ago

note that two improvements landed in 6.0.2 that should help folks that were seeing slowness on reporting thanks to @makicamel folks can give it a try and let us know how it helps. Note one of them you have to opt into as it isn't on by default.

frsantos commented 9 months ago

@danmayer I updated our staging environment to 6.0.2, and added the get_coverage_cache: true option to the HashRedisStore adapter, but the page still times out.

danmayer commented 8 months ago

OK, folks I am finally making some progress on this... I have a small reproducible benchmark that shows the slowdown as additional files are added to a project. I will use it to drive some improvements and suggestions for folks.

danmayer commented 8 months ago

OK, found some interesting things... while the hashredis store is faster in some cases and removes a rare race condition on the traditional redis store... It is far less performant with a large number of files when building the web report...

have any of these projects tried the standard redis store vs the hash redis store? As the hash redis store has an N+1 redis query based on N number of files.

With a project that has 4500 Ruby files with coverage the traditional store is twice as fast reporting.

> COVERBAND_HASH_STORE=true COVERBAND_DISABLE_AUTO_START=true BENCHMARK_COVERBAND=true bundle exec rake coverband_benchmark
using hash store
  1.042208   0.148724   1.190932 (  3.672132)
> COVERBAND_DISABLE_AUTO_START=true BENCHMARK_COVERBAND=true bundle exec rake cover
band_benchmark
  0.729637   0.135307   0.864944 (  0.988047)

danmayer commented 8 months ago

OK, folks while there are likely a few options to improve the performance since the HashRedis algo stores and pulls data via a key per file, even with pipelining it can be slow for large projects. I was going to setup an optional reporter that pages through the data, this has some disadvantages like not getting the full project coverage summary, but it should also be an easy way to get things usable for large projects.

Here is a video showing how it would work, let me know if this approach would work for you all @frsantos @SyntaxRules-avant @kamal

https://github.com/danmayer/coverband/assets/24925/5df32cfa-7064-49eb-9352-c0276defd875

danmayer commented 8 months ago

Also if someone can give me some latency stats around their Redis calls that would be helpful. I am able to add arbitrary latency to my Redis benchmarks... but basing them off some real latency numbers would help me optimize this a bit more.

thanks

frsantos commented 8 months ago

Thanks @danmayer

OK, folks while there are likely a few options to improve the performance since the HashRedis algo stores and pulls data via a key per file, even with pipelining it can be slow for large projects. I was going to setup an optional reporter that pages through the data, this has some disadvantages like not getting the full project coverage summary, but it should also be an easy way to get things usable for large projects.

The coverage summary would be useful but I think we can live without it. However, I see two problems with the paged approach: sorting and searching. Are those going to be available without a performance hit? Without those, having a list of thousands of elements is not usable at all.

Also if someone can give me some latency stats around their Redis calls that would be helpful. I am able to add arbitrary latency to my Redis benchmarks... but basing them off some real latency numbers would help me optimize this a bit more.

Not sure what redis call we are talking about here, but for me a GET call could range from 0.7ms to 21ms, with an average of 2ms, and a MGET could range from 1ms to 70ms, with an average of 4.5ms

danmayer commented 8 months ago

OK, yeah It requires some extra work but I could make the search work across the paging. I don't think I can' do the normal table sorting as that requires the coverage data...

Alternatively, I could not change the search or filter in that they will work one everything loaded on the page, and I can have the summary and the file list start at 0 and grow as data is loaded.... so you could search and sort but it would only do so for the data that got loaded...

thanks the redis metrics will help me make my benchmarks match the situation you are seeing more closely.

frsantos commented 8 months ago

The main feature we use is file search for checking if a class or method has been really used. The sorting is a nice tool to detect code no longer used. The summary is bells and whistles, but does not really help us to detect dead code.

frsantos commented 8 months ago

Also, a big problem is not just the table, but the file detail. A 4 lines file takes many seconds to load, so even if the search can find it, it's very slow to show it. Right now it takes us 20s to show a 4 lines file.

SyntaxRules-avant commented 8 months ago

Paging would work fine for me as well. My use cases are similar- I'm mostly looking for dead files & functions. This seems to be a key benefit for larger code bases.

I am not using HashRedisStore, but my current version of coverband is 5.2.5. I'll work on getting it updated.

danmayer commented 8 months ago

yeah at the moment loading a single file will still load the entire coverage report and then grab the one file... so we could definitely improve the way we load single files...

interesting @SyntaxRules-avant in some sense the original one scales a bit better as it doesn't have an N+1 problem, but it can also become a very large coverage blob... I do think adding gzip compression and possibly packaging as msgpack vs json could help with the more traditional redis store... I can play around with that some more.

danmayer commented 8 months ago

ok @frsantos I have a big win on loading single files with the hash redis store, while the N+1 is problematic when building a full report... It made it easy to have the load file details do a single small redis lookup for a specific file... Benchmark running on a project with 4500 files.

# old implementation
COVERBAND_HASH_STORE=true BENCHMARK_COVERBAND=true bundle exec rake coverband_benchmark_single_file
  0.611703   0.049188   0.660891 (  1.928851)

# new single file lookup
COVERBAND_HASH_STORE=true BENCHMARK_COVERBAND=true bundle exec rake coverband_benchmark_single_file
  0.009707   0.000001   0.009708 (  0.062045)

This is on my paging branch so not merging / releasing it yet.

danmayer commented 8 months ago

OK, my branch now has data table paging via server side data with paging... I need to get it to properly pass the search term, and I have disabled the sorting feature... but I think this will get us to a good spot.

danmayer commented 8 months ago

OK, closer, but even with background paging, the dom and datatable gets overwhelmed when I page in over 2K records... so now with the basics working, I am thinking I can add server side search, and look at how to better optimize or leverage the UI paging.

danmayer commented 8 months ago

Oh nice fixed the perf issue with large row count... Looks good tested up to 7K files and it was still responding snappy. OK just need to clean up and cover a weird edge case with paging, and I should be able to get a release candidate out for folks to test.

danmayer commented 8 months ago

OK @frsantos and @SyntaxRules-avant I have a release candidate that you all can try out to see if it lets you load your projects

instructions on how to test it in the PR
https://github.com/danmayer/coverband/pull/516
release is 6.0.3.rc.1

danmayer commented 6 months ago

OK, folks can try this out it is in the released 6.1.0 main version but you still need to use the config.paged_reporting = true option to enable it... Skipping paged reporting is faster for me by far, but on projects where the time exceeds 30 seconds on a heroku the paging option should allow for projects of any size to load in the background of a tab. thanks so much for all the feedback, support, and questions. Let me know if folks see other issues.

danmayer commented 6 months ago

a few fixes and improvements have been released so 6.1.2 should work for folks.

danmayer / coverband

Enhancement Request: Speed up coverage loading #484