Open Glitchy-Tozier opened 1 year ago
For anyone interested, I created a code corpus called granite-code-ngrams. It contains following ngrams:
and a mixture (40% Python, 10% Rust, 20% JavaScript, 20% TypeScript, 10% CSS) called "code". I licensed it under MIT so you could also include it in this repo if needed.
Not sure where to find a corpus, but I think having access to ngrams of a mix of various programming-languages would be pretty nice.