Closed zcdunn closed 10 months ago
Hi there,
thank you for the report and sorry that it took so long to answer. I'm a bit busy currently.
May I add this example to the repository as a test case?
Sure, you can add it to the repo. And there's no rush on this; thanks for taking the time to look at it.
I think I found the problem… it looks like a bug in Floki, the HTML parser I use. The element name for this:
<a
href="https://gilest.org/indie-easy.html"
class="u-url entry__link"
itemprop="url"
>
is an a\n
instead of just a
. I have to investigate further…
Floki was innocent 🫣 it was my bug.
I fixed it and like to release a new version. Do you mind to check if it works for you now?
🎉 Thank you! That solved the url parsing.
It's still parsing the outer h-entry
's category
property as part of the inner h-cite
. Should I open a separate issue for that?
Sigh.
Both bugs are caused by a workaround for a bug in MochiWeb. It doesn't deal very well with whitespaces, so I have to replace them with their entities. And I do it within tags, too, which changes the parse tree of the document.
I have to rewrite whitespace handling…
Can you check again? I completely overhauled whitespace handling, and your test case (and the old test cases) now work for me.
Yep, it's working for me now. Thank you!
Cool :-) I just released v 1.0.1
On my site, I use
h-cite
for contexts to remote urls (replies, reposts, etc). I have them marked up as described here, where theu-*
property of theh-entry
is on the same element as the rooth-cite
and not directly on the link itself. It seems like this library is not getting the url property of the nestedh-cite
right. It's also incorrectly parsing thecategory
of the outerh-entry
as thecategory
of the nestedh-cite
For this repost, you can see parse results below that show a
repost-of
property (at$.items[0].properties.repost-of[0]
) with a nestedh-cite
that contains the correcturl
:Parsed
repost-of
from pin13/unmungParsed
repost-of
from microformats2-elixir