CarperAI / Code-Pile

This repository contains all the code for collecting large scale amounts of code from GitHub.
MIT License
105 stars 29 forks source link

Add LeetCode #38

Closed faraday closed 1 year ago

faraday commented 2 years ago

Addressing: https://github.com/CarperAI/Code-Pile/issues/7 https://github.com/CarperAI/Code-Pile/issues/8

Statistics about the LeetCode data:

Snapshot date (until): 2022-09-26 Questions: 2421 records (7.8 MB as JSONL, 1.2 MB bzipped) Discussion topics (pseudo-solutions): 2351568 records (3.2 GB as JSONL, 405 MB bzipped) Comments: 525802 records(430 MB as JSONL, 46 MB bzipped) Comment replies: 293361 records (222 MB as JSONL, 26 MB bzipped)

faraday commented 2 years ago

I'll add a commit to represent in lm_dataformat

PhungVanDuy commented 2 years ago

@faraday can you add me into your repo? I will modify URL to S3 urls

faraday commented 2 years ago

@PhungVanDuy I just added you. Thanks

PhungVanDuy commented 2 years ago

Thank you so much, please update the code if have any new updates. We going to review PRs and merge soon. Thank you so much :)

ncoop57 commented 2 years ago

@faraday can you enable me to make changes by checking this off please? https://github.blog/2016-09-07-improving-collaboration-with-forks/

ncoop57 commented 2 years ago

So far, it looks great!

faraday commented 2 years ago

@ncoop57 I just added you to my fork (I read the article and thought you needed me to add you).

ncoop57 commented 1 year ago

Thanks so much @faraday !