aibtcdev / training-data

A curated collection of raw data for training language models
0 stars 1 forks source link

Identify more training data sources #1

Closed whoabuddy closed 1 month ago

whoabuddy commented 4 months ago

We currently have:

What else should we add?

whoabuddy commented 4 months ago

Add smart contracts repo from boom. (250gb?)

@moodmosaic has a curated version from fuzzing.

moodmosaic commented 4 months ago

We could also add https://github.com/stacks-network/clarity-wasm/tree/main/clar2wasm/tests/contracts.

whoabuddy commented 2 months ago

Linking this issue as the idea overlaps:

whoabuddy commented 2 months ago

Style guides seem like something we want to document as well, AirBnB uses this one for example.

whoabuddy commented 1 month ago

Anthropic has a great course on using their LLMs here, content like this would be super helpful too: https://github.com/anthropics/courses/tree/master

Might be worth breaking some of these out into smaller issues? What's the actionable next step(s)?

whoabuddy commented 1 month ago

Clarity book too!

whoabuddy commented 1 month ago

Migrated items listed into this issue into separate issues as action items. Closing this one out!