getcursor / cursor

The AI-powered code editor
https://cursor.sh
20.57k stars 1.38k forks source link

Manual Bulk Link Submission for Comprehensive Documentation Scraping #1242

Open kevinseabourne opened 4 months ago

kevinseabourne commented 4 months ago

Feature Request: Manual Bulk Link Submission for Comprehensive Documentation Scraping

Background

Throughout my experience with various programming languages, I have frequently encountered challenges related to the navigation of their documentation. Traditional AI scraping methods often result in incomplete data capture, with some inquiries resulting in as few as two relevant links being retrieved from the provided documentation URL. This limitation hinders the breadth and depth of knowledge that can be accessed efficiently.

Proposal

To address this issue, I propose the implementation of a feature that allows for manual bulk link submission within the cursor.io programming language. Users shall have the ability to submit a collection of direct links to the full array of documentation pages for a given programming language. This will ensure comprehensive coverage of the intended documentation, allowing the AI to possess a complete dataset from which to draw precise and relevant information.

Implementation Overview

The feature will include:

  1. Manual Link Submission Interface: An interface within cursor.io for users to input a list of URLs corresponding to the desired documentation pages.
  2. Processing System: A system that validates and organizes the submitted links, ensuring data integrity and easy access for the AI.
  3. Link Categorization and Tagging: The ability for users to categorize and tag submitted links to aid searchability and context-aware retrieval by the AI.
  4. Integration with AI Scraping Framework: A seamless addition to the existing AI scraping methods to enhance the overall process.

Anticipated Benefits

This enhancement is expected to provide:

I am confident that this feature would be a valuable addition to the cursor.io programming language ecosystem, optimizing the way in which documentation is processed and utilized. I anticipate that the manual bulk link submission mechanism will lead to a richer and more reliable compilation of programming knowledge for both AI and users.

Sanger2000 commented 4 months ago

What docs do you think we're doing a bad job of supporting?

kevinseabourne commented 4 months ago

solid-start only scraped 2 pages, surrealDB, motionone, but its not about a "bad job of supporting" it's the edge case that some documentation may be hard to scrap resulting in poor retrieval. By allowing users to paste multiple links, this covers that edge case.

kevinseabourne commented 4 months ago

Modular Forms is a great example of documentation which results in poor retrieval https://modularforms.dev/solid/guides/

danielgwilson commented 2 months ago

I run into a lot of issues, e.g. most recently:

johncomposed commented 1 month ago

Fwiw this has been the dealbreaker for me fully switching over to cursor over traditional vscode+copilot. It struggled with two relatively well-known libraries that I tried. I'd be happy to do the additional legwork of assembling the docs and feeding it to cursor (whether in the form of links, downloads, whatever), but the fact that the document scraping process is so opaque makes the key feature of "understanding third-party libraries" extremely unreliable. Or in my case, useless.

Just allow us to explicitly input files or links to documentation and this becomes solved for basically all libraries.

kevinseabourne commented 1 month ago

is anything going to be done about this ? pretty simple solution. Allow users to edit the list of urls which are scrapped

Screenshot 2024-05-14 at 10 03 55 PM