godotengine / godot-docs

Godot Engine official documentation
https://docs.godotengine.org
Other
4.1k stars 3.3k forks source link

LLMs-full.txt - Allow AI editors like Cursor to better consume the the doc #10549

Open mimamuh opened 1 week ago

mimamuh commented 1 week ago

Your Godot version: All Versions ...

Issue description: AI driven code editors like Cursor, Windsurf, ... allow to add docs to provide context to the LLM models for better output. But due to the limited context window of LLMs it is not the best to provide the docs do LLMs the classical way: HTML, CSS, JavaScript and so on bloats the context window. Therefore a new standard is arising to provide the technical documentation in a LLM friendly way: The llms-full.txt file, similar to a robots.txt file for example.

Here is a brief introduction into the topic: https://towardsdatascience.com/llms-txt-414d5121bcb3

It would be good having such a file for the Godot documentation to consume it with AI editors.

URL to the documentation page (if already existing): All pages

tetrapod00 commented 1 week ago

I think I speak for the general opinion of Godot maintainers on AI, which is at best skeptical of the current error-prone nature and at worst a legal and moral liability to use at all. With that in mind I very much doubt that we will be implementing this ourselves.

It looks like the format of llms-full.txt is a markdown file with the full contents of the docs? That is broadly similar to one of the intermediate forms that our RST-based docs compile to. If you need this for your own personal use, you probably could compile the docs locally and inject some RST-to-markdown conversion step to create a full markdown version of the docs. I also wouldn't be surprised if current LLMs could read the RST format just fine, too. Either way, I don't think this is something we have current plans to implement.

mimamuh commented 1 week ago

@tetrapod00 Hey, thx for your opinion. I understand your hesitation. But in software development outside of gaming at least (I'm doing this Godot stuff just for fun) we already using it heavily. It gets better every day and when you learn how to use it, the results are not too bad in general and it is a useful tool.

It is not perfect of course and you still have to learn your stuff and it doesn't replace professionals. But it saves time, and therefore I doubt it can be ignored in the long run. Just a simple example: With AI editors like Cursor I can take my player file, which bloated over time into a big file, and ask to extract certain features in a new separated file. I did this yesterday and the results has been already quite good and it is much faster than doing all this copy and pasting and untangling myself. It automatically replaces certain dependencies, and so on ...

To make the output of such tool better it would be good if the docs are presented in this new llms-full.txt format. As AI editors like Cursor and Windsuf allows me to put docs of libraries into the context window of the LLMs and base requests specifically on them. Having public dos optimized and providing the docs in this new standard can be super helpful in the day to day work improving developer experience.

I doubt this trend will disappear as the economical advantages are there. If you can make the same game with LLMs already now in, let's pretend, 90 % the time, it will be done like that. Don't get me wrong: It is not magic: You still have to know what you want to do and often times how to do it. But it can help speed things up and this even more if you know the stuff you're working with.

AThousandShips commented 1 week ago

Regardless of LLMs becoming an industry staple there's no good reason IMO to assume this particular way to scrape data will be, it's so inconvenient that I expect it to go away or become just a marginalized feature

So if we will ever do it it should be done after this becomes an industry standard, like robots.txt is

tetrapod00 commented 1 week ago

Yeah, speaking personally I agree that LLMs will likely remain in common usage. But I'm not confident that this particular standard will be adopted. Personally I think it's more likely that the AI tooling adapts to the existing structure of the web than the other way around (likely by increasing the available context window again).

If this standard was so widespread that we could implement it just by changing a setting (e.g. ReadTheDocs supported it out of the box), we might consider adopting it. But I don't think we're interested in customizing our build process to add a format with unclear adoption, in the service of enabling easier scraping by LLMs.

danhealy commented 17 hours ago

FWIW Cursor's @Docs feature was able to index all of the Godot docs for me (& proved it by teaching me new things...)

mimamuh commented 17 hours ago

@danhealy The main purpose is to reduce the number of tokens (stuff fitting into the AI context window) needed to consume the docs so more tokens are available to perform the actual task when running the LLMs in your codebase. Remember: Consuming the docs is one side of the equation, letting the AI understand and consume your codebase is the other to use things like agents / composer ... As more tokens we are saving for the docs, the better.