Open eklem opened 5 years ago
Make it a switch so the user can choose. Not sure how to do this without doing code for all the endpoint services.
To not break everything, make body.innerText default, so re-added bookmarklets behave the same.
This is what I'll try:
When reading Check if rawHTML key exists. If yes, check if true (get raw HTML) / false (get body.innerText). If it doesn't exist / is not set, get body.innerText. This way, old bookmarklets indexedDBs will work with new code.
When writing / creating rawHTML defaults to true. This will be the most common case for a search engine, document processing when the search engine reads the data from where it's stored. At this point it's easier to create elaborate document processors than in the bookmarklets code.
Up for debate. Today, just the text from body is grabbed, but having a better extraction process for this in the back end (browser that too, but more code space than a bookmarklet).
Then it can be up to the receiving end to do extraction of text. Daq-proc could have a cheerio processor included.