Extract only relevant content via CSS Selectors / XPath

Stevenic / vectra

Vectra is a local vector database for Node.js with features similar to pinecone but built using local files.

MIT License

321 stars 29 forks source link

Open mihai-stancu opened 3 months ago

mihai-stancu commented 3 months ago

Hi,

Is there any way to isolate only the main content of the page being indexed instead of grabbing the entire page?

I'm currently using some bash to do this:

  pup '.main-body .kb-content' -f "$file" | html2markdown > "$tmpfile";
  npx vectra add var/help --keys /tmp/vectra.json --uri "$tmpfile";

Stevenic commented 2 months ago

Not currently but that's a good suggestion