continuedev / contribution-ideas

A Repo to which to Attach Contribution Ideas
2 stars 0 forks source link

Improved Webpage Parsing #9

Open sestinj opened 1 year ago

sestinj commented 1 year ago

The current DocsContextProvider mostly uses the default from `node-html-markdown to convert HTML to markdown, but this sometimes includes junk, including ads, navigation text, etc... We just want the important content (headers, paragraphs, etc...).

barelysomethin commented 2 months ago

hey, hi! how can i know more about this and do we want to shift to a new tool like puppeteer and cheerio or we just want to change configs of node-html-markdown according to our needs.