LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.94k stars 3.22k forks source link

Using CodeReview StackExchange data with open-bugger #1396

Open caridorc-tergiliti opened 1 year ago

caridorc-tergiliti commented 1 year ago

StackExchange CodeReview contains a lot of correct open source code snippets, simple and self-contained, especially if we filter using the beginner tag. We can add bugs to it with open bugger and use it to train. The license of all content there is creative commons.

huu4ontocord commented 1 year ago

assigned to you for now. But we need someone to take this on to help as @caridorc-tergiliti doesn't have time to do this!

RiccardoRiglietti commented 1 year ago

Automating this looks hard, as we need a way to pull code from questions, that contain both code and text, also I suggest using beginner questions that contain less and simpler code: https://codereview.stackexchange.com/questions/tagged/beginner

The beginner questions are only around 7 thousands so this could also be done manually with crowdsourcing, also allowing the people to add the error that happens when trying the run the wrong code to the prompt.

huu4ontocord commented 1 year ago

Hi @caridorc-tergiliti -can you give us a status? is this too hard to do per @RiccardoRiglietti. If so, we can change scope or close this issue. thank you!