aantron / lambdasoup

Functional HTML scraping and rewriting with CSS in OCaml
https://aantron.github.io/lambdasoup
MIT License
380 stars 31 forks source link

Lambdasoup removes colons from attribute names #39

Closed tmattio closed 3 years ago

tmattio commented 3 years ago

The following code:

let body =
  {|
<!DOCTYPE html>
<html>
<body class="class" :class="class" />
</html>
|}

let () = Soup.parse body |> Soup.to_string |> print_endline

outputs:

<!DOCTYPE html><html><head></head><body class="class" class="meta">

</body></html>

As you can see the colon in front of the second :class attribute is removed somewhere in the pipeline.

This is problematic as libraries such as AlpineJS use it, so removing the colons breaks HTML documents that make use of AlpineJS or similar libraries.

aantron commented 3 years ago

Thanks. It's most likely happening at the Markup.ml level (it might even be the standards-compliant thing to do....), I'll take a look shortly.

aantron commented 3 years ago

This should be fixed in Markup.ml master by the above commit. Could you give it a try? Would you like a speedy release?

tmattio commented 3 years ago

That works, thanks a lot @aantron! It's a rather specific bug, so nothing critical, but I'll add a lower bound once a release is out 🙂