eklem / nowcontent.xyz

Create bookmarklets for sending content / pages to service [xyz]
MIT License
7 stars 1 forks source link

Option to send either raw HTML or body.innerText (default) #56

Open eklem opened 5 years ago

eklem commented 5 years ago

Up for debate. Today, just the text from body is grabbed, but having a better extraction process for this in the back end (browser that too, but more code space than a bookmarklet).

Then it can be up to the receiving end to do extraction of text. Daq-proc could have a cheerio processor included.

eklem commented 5 years ago

Make it a switch so the user can choose. Not sure how to do this without doing code for all the endpoint services.

eklem commented 5 years ago

To not break everything, make body.innerText default, so re-added bookmarklets behave the same.

eklem commented 5 years ago

This is what I'll try:

When reading Check if rawHTML key exists. If yes, check if true (get raw HTML) / false (get body.innerText). If it doesn't exist / is not set, get body.innerText. This way, old bookmarklets indexedDBs will work with new code.

When writing / creating rawHTML defaults to true. This will be the most common case for a search engine, document processing when the search engine reads the data from where it's stored. At this point it's easier to create elaborate document processors than in the bookmarklets code.

eklem commented 4 years ago

Whis goes really well with daq-proc, since cheerio is now a part of it. Will be able to have lots more logic in how to extract content from a page.