dodona-edu / dodona

🧑‍💻 Learn to code for secondary and higher education
https://dodona.be
MIT License
67 stars 22 forks source link

Automatically generate draft answers for student questions #5331

Open bmesuere opened 8 months ago

bmesuere commented 8 months ago

With the increasing capabilities of LLMs, it is only a matter of time before they become powerful/cheap enough to use them inside Dodona. A first step might be to generate draft answers for questions from students. Here's how it might function:

This approach minimizes risk since each AI-generated answer undergoes human review and editing. Moreover, it's not time-sensitive. If the AI draft is inadequate or fails, the situation remains as it is currently. However, the potential time savings could be substantial.


Since this would be our first LLM integration, this will involve some research aspects.

bmesuere commented 8 months ago

Some old code I wrote to generate answers based on questions as a stand-alone script:

import OpenAI from "openai";

import { JSDOM } from 'jsdom';

const dodonaHeaders = new Headers({
  "Authorization": ""
});

const openai = new OpenAI({
  apiKey: ""
});

const systemPrompt = "Your goal is to help a teaching assistant answer student questions for a university-level programming course. You will be provided with the problem description, the code of the student, and the question of the student. Your answer should consist of 2 parts. First, very briefly summarize what the student did wrong to the teaching assistant. Second, provide a short response to the question aimed at the student in the same language as the student's question.";

const questionId = 148513;

async function fetchData(questionId) {
  // fetch question data from https://dodona.be/nl/annotations/<ID>.json
  let r = await fetch(`https://dodona.be/nl/annotations/${questionId}.json`, {headers: dodonaHeaders});
  const questionData = await r.json();
  const lineNr = questionData.line_nr;
  const question = questionData.annotation_text;
  const submissionUrl = questionData.submission_url;

  // fetch submission data
  r = await fetch(submissionUrl, { headers: dodonaHeaders });
  const submissionData = await r.json();
  const code = submissionData.code;
  const exerciseUrl = submissionData.exercise;

  // fetch exercise data
  r = await fetch(exerciseUrl, { headers: dodonaHeaders });
  const exerciseData = await r.json();
  const descriptionUrl = exerciseData.description_url;

  // fetch description
  r = await fetch(descriptionUrl, { headers: dodonaHeaders });
  const descriptionHtml = await r.text();
  const description = htmlToText(descriptionHtml);

  return {description, code, question, lineNr};
}

async function generateAnswer({description, code, question, lineNr}) {
  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {"role": "system", "content": systemPrompt},
      {"role": "user", "content": `Description: ${description}\nCode: ${code}\nQuestion on line ${lineNr}: ${question}`}
    ]
  });
  console.log(response);
  console.log(response.choices[0].message);
  //return gptResponse.data.choices[0].text;
}

function htmlToText(html) {
  const dom = new JSDOM(html);
  const text = dom.window.document.body.textContent
    .split("\n")
    .map(l => l.trim())
    .filter(line => !line.includes("I18n"))
    .filter(line => !line.includes("dodona.ready"))
    .join("\n");
  return removeTextAfterSubstring(text, "Links").trim();
}

function removeTextAfterSubstring(str, substring) {
  const index = str.indexOf(substring);

  if (index === -1) {
    return str;  // substring not found
  }

  return str.substring(0, index);
}

const data = await fetchData(questionId);
console.log(data);
await generateAnswer(data)
bmesuere commented 8 months ago

I tested the runtime performance of a few models on my mac studio (64GB memory):

Model Quantization Memory usage Inference
codellama-34b-instruct Q5_K_M 22.13 GB 9.87 tok/s
codellama-34b-instruct Q6_K 25.63 GB 9.58 tok/s
codellama-34b-instruct Q8_0 33.06 GB 9.32 tok/s
codellama-70b-instruct Q4_K_M 38.37 GB 7.00 tok/s
codellama-70b-instruct Q6_0 49.39 GB crashed
mixtral-8x7b-instruct Q5_K_M 29.64 GB 21.5 tok/s

I could not validate the output of codellama-70b since it seems to use a different prompt format.

bmesuere commented 8 months ago

I played around with the various models this afternoon. Some early observations: