aantron / lambdasoup

Functional HTML scraping and rewriting with CSS in OCaml
https://aantron.github.io/lambdasoup
MIT License
378 stars 30 forks source link

Allow custom elements in <head>? #53

Open dmbaturin opened 10 months ago

dmbaturin commented 10 months ago

Right now, if lambdasoup encounters a non-standard element in <head>, it moves that element to <body>.

utop # Soup.parse {|<html> <head> <faketag content="no such element"> </head> <body> </body> </html> |} |> Soup.to_string ;;
- : string =
"<html><head> </head><body><faketag content=\"no such element\">     </faketag></body></html>"

For soupault, that means that it's impossible to write plugins that translate fake elements in <head> to real, valid HTML. For <body>, such an approach proved very fruitful — it serves the same role as "shortcodes" in other SSGs but it's a lot more flexible (see https://soupault.app/plugins/#augmented-html).

There are use cases for extending <head> in the same manner, but lambdasoup makes that impossible at the moment.

Do you think there could be an option to allow that, or that moving unusual elements to the body may be an overcorrection? Or are there problems that can only be solved by the current behavior that I fail to see?

aantron commented 10 months ago

As I recall, this behavior is part of the error correction specified in the HTML5 spec. Disabling it as an option would have to be added to Markup.ml. I would probably merge a PR that does so.