continuedev / continue

⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
https://docs.continue.dev/
Apache License 2.0
15.81k stars 1.2k forks source link

Failed to scrape folder for docs #1508

Open wizche opened 2 months ago

wizche commented 2 months ago

Before submitting your bug report

Relevant environment info

- OS: Debian 12
- Continue: v0.9.159
- IDE: VS Code

Description

When serving a website via python3 -m http.server scraper doesnt construct the URLs correctly.

2024-06-18-115512

To reproduce

  1. Create a structure similar to the one in the screenshot
  2. Serve the website
  3. Start the scraper via Add docs...
  4. page2.html is called on root (/) instead of /sub path

Log output

[Extension Host] Starting crawl from:  http://localhost:8000/  - Max Depth:  3
[Extension Host] Crawl completed
[Extension Host] Creating Embeddings for  1  articles
[Extension Host] Adding  0  embeddings to db
[Extension Host] Error handling webview message: {
  "msg": {
    "messageId": "f40d8f31-2ae1-4ead-85e6-27fc93ffc0e8",
    "messageType": "context/addDocs",
    "data": {
      "startUrl": "http://localhost:8000/",
      "rootUrl": "http://localhost:8000/",
      "title": "test",
      "maxDepth": 3
    }
  }
}

Error: Either data or schema needs to defined
sestinj commented 2 months ago

@wizche Any chance you could .zip the contents of this example site so I can test the literal exact same thing? I imagine this would make it quite straightforward to solve the problem

wizche commented 2 months ago

Hi Nate, Its literally just a folder with an empty file on it. My guess is that the scraper always joins path with the base URL and not with the actual parent

cmann50 commented 2 months ago

Let me know if this requires a separate ticket. I am using the JetBrains plugin on OSX and trying to add documentation using @docs. It lets me specify the URL to the website with the documentation and a name. I hit “OK”, and after a few seconds, a floating popup appears along with this error in the logs: “Either data or schema needs to be defined.”

There are no other errors in the logs.

[info] Starting Continue core...
[2024-06-24T14:27:15] [info] Starting Continue core... 
[2024-06-24T14:28:04] Error running handler for "context/addDocs":  Error: Either data or schema needs to defined
[2024-06-24T14:28:04] Error: Either data or schema needs to defined 

I am using version 0.0.47 of the plugin from the EAP url listed in the troubleshooting page. But I believe the version I was on before using the EAP url was higher, but also didn't work.

edneyreis999 commented 2 months ago

@cmann50 I had exactly the same problem. I'll post my solution here. If there is a better place, please move the answer to another topic.

The extension saves the documents in an SQLite database. .continue/index/docs.sqlite You may still need to create the schema. Please just go ahead and index any docs that Continue already provides before creating your own. Or create the schema manually.

I hope it solves your problem.

cmann50 commented 2 months ago

Thanks, I tried reinstalling and now it says the @docs context provide is not supported in Jetbrains IDE, we'll have an update soon.