adobe / helix-html2md

Service to convert Helix Generic HTML Content to Markdown.
Apache License 2.0
11 stars 12 forks source link

add support for JSON-LD in script tags #533

Closed maxakuru closed 6 days ago

maxakuru commented 2 weeks ago

Make the html2md conversion symmetrical with the pipelines md2html by adding support for JSON-LD in script tags within head. Similar to how meta tags are currently handled, insert the value into the metadata table.

tripodsan commented 2 weeks ago

is this really a requirement? afaiu from @davidnuescheler , json lds tend to be large. and having a huge metadata-table is not ok.

maxakuru commented 2 weeks ago

we already support json-ld in the metadata table, it just doesn't work through script tags (but ends up there from the metadata table in pipeline conversion). This is just making the process symmetrical/roundtripable.

I agree on the size limits, but as @dylandepass said 2kb is too low, maybe 128kb? It's possible the content of the JSON-LD is enough to drive the initial render of some pages, with no other html.. I see a use in it being a higher limit

side note: whatever limit we decide on here should probably be applied to pipeline as well. Here's a page that got 250kb of json-ld from a regular gdoc: https://main--public-test--maxakuru.hlx.page/foo/1/2

tripodsan commented 2 weeks ago

whatever limit we decide on here should probably be applied to pipeline as well.

I'm not worried about the amount of JSON-LD in the header, but rather in the markdown. it fear that parsing a huge markdown is slow.

maxakuru commented 1 week ago

I added a 128kb limit for all metadata values, though I still think that limit would apply to pipeline as well (since it would need to parse markdown, too). Unless you're not concerned about that part since, in practice, it would more likely be BYOM clients to make extraordinarily large JSON-LD and less likely to occur in a word/google doc.

also just to elaborate on why this would be useful: we're working on a project that will use JSON-LD for commerce using BYOM overlays. Ideally the dev server could just use the same format (script tag) as we would see on the published doc. The current way of providing a meta table with json-ld then it being converted to a script tag on published docs is unexpected, imo. Open to other comments if you have any @tripodsan

tripodsan commented 1 week ago

though I still think that limit would apply to pipeline as well (since it would need to parse markdown, too).

not if the json-ld comes from the metadata.xlsx, not document metadata. which is probably the normal use-case

tripodsan commented 1 week ago

added a 128kb limit for all metadata values

that's good for now