AngleSharp / AngleSharp.Css

:angel: Library to enable support for cascading stylesheets in AngleSharp.
https://anglesharp.github.io
MIT License
72 stars 34 forks source link

GetInnerText() not available in PowerShell 7? #166

Closed GruberMarkus closed 6 months ago

GruberMarkus commented 6 months ago

Prerequisites

Description

Dear AngleSharp.Css team,

I am looking for a way to convert HTML to plain text, as it would be rendered in a browser.

On Windows, HTMLFile .body.innerText delivers the expected result, but HTMLFile can't be used on Linux and macOS.

With AngleSharp.Css, .body.GetInnerText() delivers the expected result in C#, as you can see in this .Net Fiddle: https://dotnetfiddle.net/gx14nq

I now need to convert the C# code from above to PowerShell 7. It seems to work fine, but GetInnerText() is not available: Method invocation failed because [AngleSharp.Html.Dom.HtmlBodyElement] does not contain a method named 'GetInnerText'.

I would be very thankful if you could give me a hint what I am doing wrong. Thanks in advance!

Here is the PowerShell code, it fails on the last line:

#Requires -Version 7

Set-Location $PSScriptRoot

Import-Module '.\AngleSharp.dll' # NuGet, v1.2.0-beta.410, netstandard2.0
Import-Module '.\AngleSharp.Css.dll' # NuGet, v1.0.0-beta.139, netstandard2.0

$htmlContent = @'
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>This is a Heading</h1>

<p>This is a paragraph.</p>

</body>
</html>
'@

$config = [AngleSharp.CssConfigurationExtensions]::WithCss([AngleSharp.Configuration]::Default)
$ctx = [AngleSharp.BrowsingContext]::New($config)
$htmlParser = $ctx.GetService[AngleSharp.Html.Parser.IHtmlParser]()
$doc = $htmlParser.ParseDocument($htmlContent)
$doc.Body.GetInnerText()

Steps to Reproduce

Please see above.

Expected Behavior

Please see above.

Actual Behavior

Please see above.

Possible Solution / Known Workarounds

No response

FlorianRappl commented 6 months ago

I am not sure why you label this a bug.

GetInnerText is an extension method. I have no idea of PowerShell, so you'll either need to find someone with more knowledge in PowerShell or get the knowledge how to use .NET in PowerShell scripts.

Most likely extension methods are not supported and you will need to call GetInnerText from the underlying static class.

GruberMarkus commented 6 months ago

Thanks for answering this fast!

The bug label was added automatically, and I could not remove it.