Feature Request: Support FastApply model format parsing

anrgct commented 1 week ago

Validations

[ ] I believe this is a way to improve. I'll try to join the Continue Discord for questions
[ ] I'm not able to find an open issue that requests the same enhancement

Problem

I'd like to propose adding support for parsing the FastApply model output format in Continue. The FastApply model (FastApply-1.5B-v1.0) is specifically fine-tuned for code merging tasks and shows improved accuracy compared to the base model.

Current Situation:

Continue currently works well with standard LLM outputs
FastApply model uses a specific output format: <updated-code>[Full-complete updated file]</updated-code>
This format is not currently supported by Continue's parser

Benefits of Supporting FastApply:

Improved accuracy in code merging tasks
Better preservation of code structure, comments, and formatting
Specialized model for fast apply functionality
Lightweight model (1.5B) suitable for local execution

Technical Details: The model output follows this specific format:

<updated-code>
[Complete updated code content]
</updated-code>

Suggested Implementation: Add a parser option to handle the FastApply format by:

Detecting the <updated-code> tags
Extracting the content between these tags
Processing it as the complete updated file

Performance Reference:

Model size: 1.5B parameters
Local performance on M2 Max:
- q8_0: ~100 tokens/s
- q4_0: ~140 tokens/s
- fp16: ~70 tokens/s

Would you consider adding support for this format? This would enable users to leverage specialized models like FastApply while maintaining Continue's excellent user experience.

Solution

No response

sestinj commented 1 week ago

@anrgct This is super cool, I hadn't heard about this model before, but definitely would be useful. Is there any chance you would be interested in making a PR, or giving it a first attempt? We'd definitely like to support this, and that would probably be fastest way to make sure this gets priority.

I agree with your general solution here. Probably a templates file like edit.ts or chat.ts or autocomplete/templates.ts would be the thing to do, and then write a quick autodetect function.

Have you checked with Ollama on whether they would support this? Entirely local apply would be great

anrgct commented 1 week ago

I can use the gguf model in lmstudio, and ollama should work as well. Although I quickly modified and tested the content below and found it to be feasible, I think it would be more effective if we could make filterCodeContent a customizable configuration item while also masking the original process. Could you please help with the modification? Thank you!

// core/edit/streamDiffLines.ts
  let lines = streamLines(completion);

-   lines = filterEnglishLinesAtStart(lines);
-   lines = filterCodeBlockLines(lines);
-   lines = stopAtLines(lines, () => {});
-   lines = skipLines(lines);
-   if (inept) {
-     // lines = fixCodeLlamaFirstLineIndentation(lines);
-     lines = filterEnglishLinesAtEnd(lines);
-   }

+  lines = filterCodeContent(lines);

// config.json
{
    "models": [
        {
            "title": "fastapply-1.5b-v1.0@q8_0",
            "model": "fastapply-1.5b-v1.0@q8_0",
            "apiBase": "http://192.168.8.110:5000/v1",
            "provider": "lmstudio",
            "contextLength": 16000,
            "completionOptions": {
                "maxTokens": 4000,
                "stop": [
            "<|endoftext|>"
          ],
                "temperature": 0.01,
            },
            "promptTemplates": {
                "edit": "<|im_start|>system\nYou are a coding assistant that helps merge code updates, ensuring every modification is fully integrated.\n|im_end|>\n<|im_start|>user\nMerge all changes from the <update> snippet into the <code> below.\n- Preserve the code's structure, order, comments, and indentation exactly.\n- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.\n- Do not include any additional text, explanations, placeholders, ellipses, or code fences.\n\n<code>{{{codeToEdit}}}</code>\n\n<update>{{{userInput}}}</update>\n\nProvide the complete updated code.<|im_end|>\n<|im_start|>assistant"
            }
    }]
}

// core/diff/util.ts
export async function* filterCodeContent(lines: LineStream): LineStream {
  let insideCodeTag = false;

  for await (const line of lines) {
    if (line.includes("<updated-code>") && line.includes("</updated-code>")) {
      const [, extractedContent] = line.split("<updated-code>");
      const [content, ] = extractedContent.split("</updated-code>");
      yield content;
      continue;
    }

    if (line.includes("<updated-code>")) {
      insideCodeTag = true;
      const partialContent = line.split("<updated-code>")[1];
      if (partialContent) {
        yield partialContent;
      }
      continue;
    }

    if (line.includes("</updated-code>")) {
      insideCodeTag = false;
      const partialContent = line.split("</updated-code>")[0];
      if (partialContent) {
        yield partialContent;
      }
      continue;
    }

    if (insideCodeTag) {
      yield line;
    }
  }
}

continuedev / continue