getcursor / cursor

The AI-powered code editor
https://cursor.sh
20.56k stars 1.38k forks source link

Advanced Page Filtering Options for Custom Docs Creation #1344

Open MaximilienFourmyBeyond opened 3 months ago

MaximilienFourmyBeyond commented 3 months ago

Is your feature request related to a problem? Please describe. When we provide a web link to create Custom Docs, it crawls all pages, but some are not useful for users and could be skipped. In my case, the documentation includes pages related to Node.js and Java. Since I do not use Java, I would like to exclude all pages related to it. I am using the following web link: https://cap.cloud.sap/docs/. However, I wish to exclude all pages under: https://cap.cloud.sap/docs/java.

Describe the solution you'd like Multiple potential solutions: 1) An option to use regex for excluding certain pages. For example: ^https://cap\.cloud\.sap/docs/(?!java).* 2) An option to display the list of pages before starting the crawl, allowing users to select the pages they want. 3) An option to delete pages in the Cursor Settings Popup under Docs.

danielgwilson commented 2 months ago

+1, very often have to mess with trailing / and which prefix / entrypoint I use to get Cursor to scrape documentation consistently. Would love to have this level of control.