fsprojects / FSharp.Data

F# Data: Library for Data Access
https://fsprojects.github.io/FSharp.Data
Other
806 stars 288 forks source link

Mishandling of spaces in HtmlNode.ToString ? #1509

Open njlr opened 1 week ago

njlr commented 1 week ago

I have some HTML that Firefox renders like this:

Screenshot 2024-06-28 at 19-33-26 njlr

If I "round-trip" this in FSharp.Data, then the output renders like this:

Screenshot 2024-06-28 at 19-33-14

Here is the HTML:

<pre class="shiki vitesse-light" style="background-color:#ffffff;color:#393a34" tabindex="0"><code><span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> input</span><span style="color:#1E754F"> =</span><span style="color:#B5695999"> "</span><span style="color:#B56959">123</span><span style="color:#B5695999">"</span></span>
<span class="line"></span>
<span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> intOfDigit </span><span style="color:#1E754F">(</span><span style="color:#B07D48">x </span><span style="color:#1E754F">:</span><span style="color:#2E8F82"> char</span><span style="color:#1E754F">)</span><span style="color:#1E754F"> =</span></span>
<span class="line"><span style="color:#393A34">  int x </span><span style="color:#1E754F">-</span><span style="color:#393A34"> int </span><span style="color:#B56959">'0'</span></span>
<span class="line"></span>
<span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> number</span><span style="color:#1E754F"> =</span></span>
<span class="line"><span style="color:#393A34">  input</span></span>
<span class="line"><span style="color:#1E754F">  |></span><span style="color:#393A34"> Seq.fold</span></span>
<span class="line"><span style="color:#1E754F">    (fun</span><span style="color:#B07D48"> state next </span><span style="color:#1E754F">-></span><span style="color:#393A34"> state </span><span style="color:#1E754F">*</span><span style="color:#2F798A"> 10</span><span style="color:#1E754F"> +</span><span style="color:#393A34"> intOfDigit next</span><span style="color:#1E754F">)</span></span>
<span class="line"><span style="color:#2F798A">    0</span></span>
<span class="line"></span>
<span class="line"><span style="color:#393A34">printfn $</span><span style="color:#B5695999">"</span><span style="color:#1E754F">%i</span><span style="color:#B56959">{number}</span><span style="color:#B5695999">"</span><span style="color:#A0ADA0"> // 123</span></span>
<span class="line"></span></code></pre>

Here is a repro script:

#r "nuget: FSharp.Data, 6.4.0"

open FSharp.Data

let inputHtml = """<pre class="shiki vitesse-light" style="background-color:#ffffff;color:#393a34" tabindex="0"><code><span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> input</span><span style="color:#1E754F"> =</span><span style="color:#B5695999"> "</span><span style="color:#B56959">123</span><span style="color:#B5695999">"</span></span>
<span class="line"></span>
<span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> intOfDigit </span><span style="color:#1E754F">(</span><span style="color:#B07D48">x </span><span style="color:#1E754F">:</span><span style="color:#2E8F82"> char</span><span style="color:#1E754F">)</span><span style="color:#1E754F"> =</span></span>
<span class="line"><span style="color:#393A34">  int x </span><span style="color:#1E754F">-</span><span style="color:#393A34"> int </span><span style="color:#B56959">'0'</span></span>
<span class="line"></span>
<span class="line"><span style="color:#1E754F">let</span><span style="color:#B07D48"> number</span><span style="color:#1E754F"> =</span></span>
<span class="line"><span style="color:#393A34">  input</span></span>
<span class="line"><span style="color:#1E754F">  |></span><span style="color:#393A34"> Seq.fold</span></span>
<span class="line"><span style="color:#1E754F">    (fun</span><span style="color:#B07D48"> state next </span><span style="color:#1E754F">-></span><span style="color:#393A34"> state </span><span style="color:#1E754F">*</span><span style="color:#2F798A"> 10</span><span style="color:#1E754F"> +</span><span style="color:#393A34"> intOfDigit next</span><span style="color:#1E754F">)</span></span>
<span class="line"><span style="color:#2F798A">    0</span></span>
<span class="line"></span>
<span class="line"><span style="color:#393A34">printfn $</span><span style="color:#B5695999">"</span><span style="color:#1E754F">%i</span><span style="color:#B56959">{number}</span><span style="color:#B5695999">"</span><span style="color:#A0ADA0"> // 123</span></span>
<span class="line"></span></code></pre>
"""

let node = HtmlNode.Parse(inputHtml) |> List.exactlyOne

let outputHtml = node.ToString()

printfn "%s" outputHtml

Maybe I have missed something?

cartermp commented 5 days ago

Nope, seems like a bug to me. Might be worth adding a test case and seeing how this function fares: https://github.com/fsprojects/FSharp.Data/blob/main/src/FSharp.Data.Html.Core/HtmlNode.fs#L115-L174

njlr commented 5 days ago

I have created a test-case and made a potential fix here: https://github.com/fsprojects/FSharp.Data/pull/1510

However, I'm not sure if the logic is correct for all cases - are pre tags special in HTML?

cartermp commented 4 days ago

Yeah, they're meant to preserve whatever formatting is within them (non-html syntax). In this case we're not respecting that.