aaronpk / XRay

X-Ray returns structured data from any URL
https://xray.p3k.app
MIT License
90 stars 15 forks source link

No author on Grapefruit #69

Open Zegnat opened 6 years ago

Zegnat commented 6 years ago

Grapefruit has some posts. XRay seems capable of parsing these posts (although I am still unsure of where in the code it decides to look at the entries within the feed).

What it does not seem to correctly do is find my author info. It should, at step 4 of the algorithm, go check the author property of the h-feed the h-entry is in. This should return https://vanderven.se/martijn/.

I might find time to debug this. Or might not. So filing it here either way.

Zegnat commented 6 years ago

This may require a lot more refactoring than I initially thought. It looks like, whenever a fragment URL is provided, XRay is only going to parse that little piece of HTML:

https://github.com/aaronpk/XRay/blob/417cc1b3cc77ed86edccf72db174853ade1d9d2b/lib/XRay/Formats/HTML.php#L82-L92

From that point in the code forward, it doesn’t even know the h-entry was part of an h-feed.

I also noticed there that PHP’s default DOMDocument is used to parse and then save the HTML. This could potentially mess up some HTML. As php-mf2 supports taking a DOMDocument as input, it definitely shouldn’t get saved to HTML first. (And it should possibly use the userland HTML parser.)

Not sure if a simply solution is available here.