Hi aantron, I was parsing some HTML and got a result I thought was interesting. The <script> tags are included in the output of the texts functions. I can see how it would be since the script is text after all, but I was wondering if this was the intended behavior.
Just to make sure I didn't make any mistakes (and to show you what I mean) I made these little tests that pass the lambdasoup test suite.
( "texts-just-script-tags" >:: fun _ ->
let soup = "<script>1 + 1</script>" |> parse in
assert_equal (texts soup) [ "1 + 1" ] );
( "texts-script-tags" >:: fun _ ->
let soup =
"<article><div><p>hi</p></div><script>1 + 1</script></article>"
|> parse
in
assert_equal (texts soup) [ "hi"; "1 + 1" ] );
Anyway, just wondering if this is the intended behavior, and if so, I suppose the easiest way would be to just filter out <script> tags before using the texts functions? Thanks!!!
Hi aantron, I was parsing some HTML and got a result I thought was interesting. The
<script>
tags are included in the output of thetexts
functions. I can see how it would be since the script is text after all, but I was wondering if this was the intended behavior.Just to make sure I didn't make any mistakes (and to show you what I mean) I made these little tests that pass the lambdasoup test suite.
Anyway, just wondering if this is the intended behavior, and if so, I suppose the easiest way would be to just filter out
<script>
tags before using thetexts
functions? Thanks!!!