Open danieljharris opened 12 months ago
Might be worth considering scraping Unity's docs and mapping classes and functions to other engine equivalents. Using a mapping would also allow to stay with ChatGPT/OpenAI although Claude.AI's pricing model is slightly cheaper on larger data sets.
I found a downloadable offline version of Unity's documentation that should do the trick for the Unity side. There are 2,046 files but I think that can be filtered down to only the "class-" ones which is only 205. These would need to be combined into a single file to be able to be used by Claude. I have this powershell script which does this, however it is unfortunately over the text limit, even with the html and link elements removed, so maybe this might not be a good approach to code migration:
# Define the directory where the .html files are stored
$sourceDirectory = ".\"
# Define the output .txt file
$outputFile = ".\FilteredHtmlContents.txt"
# Remove the existing output file if it exists
if (Test-Path $outputFile) {
Remove-Item $outputFile
}
# Regex pattern to match lines that start with <p> and contain text or a heading
$regexPattern = "^<p>.*(<h[1-6]>.*<\/h[1-6]>|[^<]+).*$"
# Loop through each .html file in the directory
Get-ChildItem -Path $sourceDirectory -Filter *.html | ForEach-Object {
# Read the content of the .html file
$content = Get-Content $_.FullName
# Filter the lines based on the regex pattern
$filteredContent = $content | Select-String -Pattern $regexPattern
# Write the filtered content to the output .txt file, stripping HTML tags and ignoring specific patterns
if ($filteredContent) {
$filteredContent | ForEach-Object {
$line = $_.Line
# Ignore lines that only contain a link or an image
if ($line -match "<p><a [^>]+><\/a><\/p>" -or $line -match "<p><img [^>]+><\/p>") {
return
}
# Remove 'a href' sections
$line = $line -replace '<a href="[^"]+">(.*?)<\/a>', "`$1"
# Remove all HTML tags
$line = $line -replace "<.*?>", ""
Add-Content -Path $outputFile -Value $line
}
}
}
Write-Host "Filtered lines from .html files have been written into $outputFile"
According to this post, for Unreal the entire documentation is already downloaded when you install Unreal at C:\Program Files\Epic Games\UE_5.1\Engine\Documentation\Builds
. I'm guessing that will also need to be combined into a single file.
It might be worth investigating https://claude.ai/ for code migration. Because it can take in large prompts (around 75,000 words) it could potentially take in all of Unity's documentation and the documentation of the engine to move to, and then make a better informed code transition.
I started looking into this myself but I've so far been unable to find a downloadable text/PDF version of Unity's full documentation. If anyone has any ideas on how to get this let me know.