go-shiori / shiori

Simple bookmark manager built with Go
MIT License
9.28k stars 551 forks source link

Parsing error and missing content on theregister.com #862

Open lgrn opened 6 months ago

lgrn commented 6 months ago

Data

Describe the bug / actual behavior

Shiori fails to parse quotes, they are not included in the saved content.

Expected behavior

The quotes are a part of the article, and should be included, preferably with some kind of UI indication that they are quotes, but at the very least included at all.

To Reproduce

Steps to reproduce the behavior:

  1. Save the article https://www.theregister.com/2024/03/18/truenas_abandons_freebsd/
  2. Inspect the saved content
  3. Note that the paragraph beginning with "The creator of PC-BSD(...)" has been saved
  4. Note that the following quote beginning with "Right now the plan(...)" is missing

Notes

This is an HTML excerpt of the problematic section -- the <p> within the <div> is not included:

<p>The creator of PC-BSD(...)</p>
<div class="blockextract">
<p>Right now the plan(...)</p>
</div>