deathau / markdownload

A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.
Apache License 2.0
2.85k stars 225 forks source link

Differences when converting a selection vs entire page #138

Open namtrah opened 2 years ago

namtrah commented 2 years ago

I am having quite some issues when trying to convert an entire page versus converting just a selection of the page (where the selection is the entire portion that is selected anyway from the former). My Example.

From DnDBeyond.com (free Frozen Sick Adventure):

  1. Open page and use Markdownload and save save file.
  2. Select text starting with Title "Frozen Sick" until the end of the page "...new adventures elsewhere in Wildemount." a. Use Markdownload and save selection as file
  3. Diff two files to see what is missing/different.

I am not certain if this is due to DnDBeyond or Markdownload. Therefore, not sending the actual diff. I hope you can recreate. What I see is that many (but not all) #### H4 headings go missing and any > citations (like for the green descriptions or yellow notes) are just dropped.

deathau commented 1 year ago

I need to make this clearer and build some more options around it, but what's happening is that when you use markdownload to clip the page, before converting it to markdown, it passes the page through Readability. This helps to strip out unwanted cruft like navigation, sidebars, etc, but it's not perfect and will sometimes strip out more than it needs to. When selecting text manually, it doesn't do that so much, because theoretically you've selected the exact text you want, so just convert that. I do plan to add options to perhaps disable Readability in certain circumstances in order to clean up issues such as this one. But for now I'm going to close it because differences between a selection vs entire page are kind of expected

deathau commented 1 year ago

see also #33