aibtcdev / training-data

A curated collection of raw data for training language models
0 stars 0 forks source link

Adding smart contracts #3

Open whoabuddy opened 2 weeks ago

whoabuddy commented 2 weeks ago

Discussed in #1 but big enough to warrant it's own issue.

The Boom team has a repository of all deployed Stacks smart contracts but it is quite large (57k contracts, 250gb+ and growing).

That repository also powers the source-of-clarity website, which seems like an interesting fit for a crew of AI agents for its own reasons.

There's a smaller repo of contracts used for clarity-wasm testing as well, which might be a better fit to use alongside the function docs.

moodmosaic commented 2 weeks ago

We could use some minimization techniques (e.g. from fuzzing) — perhaps they could apply also in this context. We could, for example:

  1. Instrument some stacks-core binary (e.g. clarity-cli): This will help us with steps 2 and 3.
  2. Use AFL++ tools to minimize the contracts:
    • Use afl-cmin to reduce the number of test cases but keep the same coverage.
    • Use afl-tmin to make each contract smaller.
  3. Check Honggfuzz's minimization: See if it helps us minimize/reduce further.