ProjectUnifree / unifree

MIT License
1.43k stars 75 forks source link

[Proposal] Investigate Claude #16

Open danieljharris opened 12 months ago

danieljharris commented 12 months ago

It might be worth investigating https://claude.ai/ for code migration. Because it can take in large prompts (around 75,000 words) it could potentially take in all of Unity's documentation and the documentation of the engine to move to, and then make a better informed code transition.

I started looking into this myself but I've so far been unable to find a downloadable text/PDF version of Unity's full documentation. If anyone has any ideas on how to get this let me know.

Blade67 commented 12 months ago

Might be worth considering scraping Unity's docs and mapping classes and functions to other engine equivalents. Using a mapping would also allow to stay with ChatGPT/OpenAI although Claude.AI's pricing model is slightly cheaper on larger data sets.

danieljharris commented 12 months ago

I found a downloadable offline version of Unity's documentation that should do the trick for the Unity side. There are 2,046 files but I think that can be filtered down to only the "class-" ones which is only 205. These would need to be combined into a single file to be able to be used by Claude. I have this powershell script which does this, however it is unfortunately over the text limit, even with the html and link elements removed, so maybe this might not be a good approach to code migration:

# Define the directory where the .html files are stored
$sourceDirectory = ".\"

# Define the output .txt file
$outputFile = ".\FilteredHtmlContents.txt"

# Remove the existing output file if it exists
if (Test-Path $outputFile) {
    Remove-Item $outputFile
}

# Regex pattern to match lines that start with <p> and contain text or a heading
$regexPattern = "^<p>.*(<h[1-6]>.*<\/h[1-6]>|[^<]+).*$"

# Loop through each .html file in the directory
Get-ChildItem -Path $sourceDirectory -Filter *.html | ForEach-Object {
    # Read the content of the .html file
    $content = Get-Content $_.FullName

    # Filter the lines based on the regex pattern
    $filteredContent = $content | Select-String -Pattern $regexPattern

    # Write the filtered content to the output .txt file, stripping HTML tags and ignoring specific patterns
    if ($filteredContent) {
        $filteredContent | ForEach-Object {
            $line = $_.Line

            # Ignore lines that only contain a link or an image
            if ($line -match "<p><a [^>]+><\/a><\/p>" -or $line -match "<p><img [^>]+><\/p>") {
                return
            }

            # Remove 'a href' sections
            $line = $line -replace '<a href="[^"]+">(.*?)<\/a>', "`$1"

            # Remove all HTML tags
            $line = $line -replace "<.*?>", ""

            Add-Content -Path $outputFile -Value $line
        }
    }
}

Write-Host "Filtered lines from .html files have been written into $outputFile"

According to this post, for Unreal the entire documentation is already downloaded when you install Unreal at C:\Program Files\Epic Games\UE_5.1\Engine\Documentation\Builds. I'm guessing that will also need to be combined into a single file.