feat(forge): Add internal metrics capability

lucas-manuel commented 1 year ago

Component

Forge

Describe the feature you would like

When running invariant tests with an actor-based pattern, there is currently a lack of visibility for:

How many times a function is called
What code paths within the function are reached
Other general metrics (e.g., how many LPs deposit total)

Having a cheatcode that would allow storing this information with arbitrary keys would be useful to be able to render this info in a summary table at the end of a fuzzing campaign.

@gakonst mentioned [prometheus] metrics as a good reference for this.

Examples:

vm.counter("numberOfLps", numberOfLps++);

vm.counter(string.concat("loan_", vm.toString(loanNumber), "_payments"), payments[loan]);

Additional context

No response

gakonst commented 1 year ago

cc @onbjerg @mattsse I'm still a bit abstract on this but I was thinking of exposing an API via cheatcodes similar to https://docs.rs/metrics/latest/metrics/, and on the CLI we would collect all these metrics and custom log them in a table

gakonst commented 1 year ago

@lucas-manuel can you give an example of how your ideal reporting would look like? a table? something else? maybe there should be plots like this https://docs.rs/tui/latest/tui/widgets/struct.Chart.html for stuff on how it evolves over time (e.g. price as volatility happens)? makes me think that this may allow creating automatic simulation / stress test reports like Gauntlet Network does.

lucas-manuel commented 1 year ago

@gakonst Yeah personally I think the lowest hanging fruit would be to log in a table similar to --gas-report. I'd mainly be using this for debugging purposes so it'd be useful as a numerical log output.

Going forward though for the more sophisticated use cases we discussed it would be interesting to export to some sort of JSON that could be used to generate more visual reporting (could be added to CI as an artifact for example).

gakonst commented 1 year ago

@FrankieIsLost @transmissions11 had some thoughts on this which I'd love if they shared in the thread :)

transmissions11 commented 1 year ago

Great idea to give users programmable insight into their invariant campaigns, been a big advocate of this for a small while now.

IMO giving devs a better understanding of the coverage of their invariant tests and tools to effectively debug them is far more valuable than building smarter fuzzers after a certain point, because once a weakness is identified its not too hard to guide the fuzzer towards it, as opposed to a genius fuzzer thats a total black box from a dev's perspective, which offers little insight into how secure a piece of code is and whether a dev can be confident in the coverage of the run. Humans and fuzzers should work tandem!

In terms of actual design:

A cheatcode to enter data into basically a time series database a la prometheus seems like a great way to go
Ideally we export tables by default, but also provide the option for timestamped JSON, etc.
- Timestampped JSON export would enable auditors and such to generate charts like this (ref: Gearbox Fuzzing):
Would be nice to have metric types beyond just counter. For inspiration, prometheus offers 3 core types:
- Gauge
- Counter
- Histogram
For tracking things like number of lps, I think it would be easiest on devs if we added a postCall hook of sorts (imagine setUp but called after every random call in an invariant run), where devs can just bulk query and update all their storage variables
However, there are still situations where calling the cheatcode dynamically in a test would be preferred (like tracking details about a specific mapping key which is hard to identify retroactively), so good to have the option to call dynamically in tests and contracts as well
One other thing we might do to make it easier on devs to track a bunch of data on their invariant runs would be some sort of cheatcode or flag to auto assign a counter/gauge/histogram to a function, state variable, etc
- That way if you wanted to check how many times a loan was originated, etc, you wouldn't necessarily have to write a separate contract instrumented will calls to vm.counter()
- Maybe we should just pass the address called on and the calldata passed to the postCall hook and that would enable something like this? I think there's some interesting design space here
- Some will certainly still choose to instrument, or do echinda style wrappers, but I think its nice to provide the quick and dirty option for devs who dont require such granularity
Finally, for the use case where devs want to instrument their contracts with calls to these counters and cannot read storage vars from test files:
- We need to provide a way to get the current value of a guage/counter/etc
- API could look something like this:
  - vm.incrementCounter("numberOfLps", 1)
  - vm.setCounter("numberOfLps", vm.readCounter("numberOfLps") + 1)
- IMO this API is also just generally less of a PITA than maintaining your own storage variables.
- Furthermore, isn't test state wiped from time to time during invariant campaigns? This has the benefit of enabling persistent state throughout a long campaign.

WDYT?

horsefacts commented 1 year ago

In addition to these more complex metrics, it would be very helpful to see a breakdown of calls/reverts by target contract + selector. For example, something like:

╭───────────────────────────────────────────┬─────────╮─────────╮
│ test/actors/Swapper.sol:Swapper contract  ┆         │         │
╞═══════════════════════════════════════════╪═════════╡═════════╡
│ Function Name                             ┆ calls   │ reverts │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGooForGobblers                        ┆ 1024    │ 612     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGoo                        ┆ 1024    │ 12      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤╌╌╌╌╌╌╌╌╌┤
│ swapGobblersForGobblers                   ┆ 1024    │ 64      │
╰───────────────────────────────────────────┴─────────╯─────────╯

This would be very helpful for writing and debugging new actor contracts.

lucas-manuel commented 1 year ago

@transmissions11

Love that Gearbox chart 👀

Yeah I agree that it would be better to design this without the need to persist storage within the contracts anywhere for this purpose. I like the suggestions of incrementCounter and setCounter. I also like the idea of a postCall hook, we've been hacking that together with modifiers inside the actors contracts but would definitely be better to have an official framework for it haha.

What are our next steps here?

I also completely agree with the summary table idea @horsefacts

odyslam commented 1 year ago

Second gakonst for using Prometheus as the engine/standard and then we can either:

output charts using some charting library
serve them via an endpoint that can be scraped by Prometheus and visualize them/analyze them in Grafana.

PaulRBerg commented 1 year ago

@lucas-manuel Until this issue gets implemented, do you know any way to output the call summary only at the end of the invariant tests run?

I saw that in your invariants-example repo, you have defined this function:

function invariant_call_summary() external view {
    console.log("\nCall Summary\n");
    ///
}

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Is there any way to output the call summary at the end of the invariant tests run? As ar as I know, there is no "after all" hook in Foundry (a function opposite to setUp), but maybe there is another way?

gakonst commented 1 year ago

But, if I understand this correctly, this test function will be executed at the end of each run, which will slow down test runs.

Yeah it'll slow it down, but a bit only I would assume. We should probably add that kind of function invariant_summary() hook as an "after all".

PaulRBerg commented 1 year ago

It depends on how many console.log statements there are .. we have quite a few and the run time is noticeably slower with the invariant summary function defined. Ended up renaming it to start with something other than invariant_ so that we can selectively activate it when we want to get a report.

hook as an "after all"

Would be super helpful.

grandizzy commented 5 months ago

considering @transmissions11 comment -

I think a nice and easy way to have such metrics is by using OpenTelemetry (OTLP) open source standard for logs, metrics, and traces as it already provides crates to facilitate such integration, see rust bindings and crates

A big pro of this is that we can easily integrate with forge and support not only Prometheus but other tools at the same time by

using tracing_opentelemetry crate to collect custom forge test events (metrics layer export OTLP counter and histogram metrics from specific test events)
use opentelemetry-otlp exporter to push metrics to a collector (that is scraped after by tools like Prometheus), or directly to receivers like Jaeger / Prometheus / others

Didn't put too much thoughts on UX but at a first call there'll be

a cheatcote vm.createTestEvent() to record test data (which use event! macro to publish a new forge test event)
the metrics subscriber which receives the event and convert to an opentelemetry metric
the periodic metrics reader which exports in bulk to OTLP endpoint through metrics exporter

lmk what you guys think about this approach, think a PoC with such can be done quite easy but wouldn't spend time on it if not of interest. thanks

grandizzy commented 5 months ago

I made a quick PoC, see https://github.com/grandizzy/foundry/tree/metrics/crates/metrics/examples#metrics-demo for reference (Code adds a metrics crate, there's no config yet, cheatcodes not in their own metrics group and better UX needed - for a quick view of code changes pls see https://github.com/grandizzy/foundry/commit/919e84b29dc757ad40dd773baf86193a9b2ef5a9) The PoC use OTEL collector to record metrics simultaneously in a file and by sending them to a local carbon endpoint, then showed in Grafana dashboard

metrics recorded are: number of times a selector was called, count of reverts (per revert type) during campaign, addresses fuzzed in campaign and number of times they were used (per selector) using vm.incrementMetrics(label) cheatcode
statistics are displayed in Grafana dashboard as

grafana

List of exporters that can be used by otel config file can be found here

I see three use cases / dimensions that can be accomplished by having such

Basic, understand how campaigns are performing
- case 1: dumping metrics in file for campaign review
- case 2: writing to a backend and visualize campaign statistics (carbon/Grafana, Prometheus, others, see exporters)
Actions performed based on collected test metrics:
- case 3: integrate in a CI pipeline and stop/restart campaign if it violates certain thresholds (number of unique values less than threshold, number of reverts greater than threshold, times a selector was hit less/greater than threshold). Ex: export to kafka, AWS Kinesis, others, see exporters
Adapt fuzzing campaigns based on collected metrics:
- if PR https://github.com/foundry-rs/foundry/pull/7428 accepted it will introduce the concept of test fixtures / data sets used in fuzzing, in first phase as inline forge-config: fixture config
- next step for fixtures would be to have cheatcodes for loading test data sets from file/URL
- then metrics can be used to provide custom input for fuzzing campaigns

For example: metrics are collected and exported to AWS Kinesis, or Apache Kafka, etc. then campaign metrics are processed and

saved in persistent layer (S3, etc) for historical analysis
campaign metrics are used to generate new datasets that will be used by further campaigns. Rules to apply could be to ensure values are unique across several campaigns or repetable to some extent or new datasets derived from already used values, etc.

any feedback appreciated, thank you

grandizzy commented 5 months ago

@Evalir we can continue discussion here, here a quick overview of what the changes to have such metrics would be https://github.com/foundry-rs/foundry/compare/master...grandizzy:foundry:metrics see also prev comments re sample and other use cases this could be used at. Wanted to get your thoughts first before polishing code and issuing a PR as it introduce some new deps (if not comfortable with due to supply chain attack events, it can be build / cfged on demand). If all good I can do the PR + update foundry book with how can be used / examples.

grandizzy commented 5 months ago

Adding on OpenTelemetry adoption / usage - Grafana just announced their open source collector (Grafana Alloy) https://grafana.com/blog/2024/04/09/grafana-alloy-opentelemetry-collector-with-prometheus-pipelines/

foundry-rs / foundry