OSDeploy / OSD

OSD Shared Functions
MIT License
144 stars 59 forks source link

Get-OSDCatalogIntel* html parsing not working #30

Closed dennisvl92 closed 7 months ago

dennisvl92 commented 2 years ago

Describe the bug -UseBasicParsing has been added to the Get-OSDCatalogIntel* scripts. Because of this the html no longer gets parsed. This results in the LastUpdate write-verbose messages not working. And the DriverDescription field cant be filled.

I guess this has been changed because the issue in ticket 26. The internet explorer engine is not available in winpe.

Second issue -UseBasicParsing has been deprecated with Powershell 6.0 and later. From the documentation: "all Web requests use basic parsing only." This means that even without -UseBasicParsing the script wont work correctly on Powershell 6.0 or later.

I did some testing and found some code that appears to be working with Powershell 5.1, 7.2 and Winpe. Maybe this could be used as a solution.

I have replaced "$DriverInfoWebRequest = Invoke-WebRequest -Uri $OSDCatalogItem.DriverInfo -Method Get -UseBasicParsing" with: $String = [System.Net.Webclient]::New().DownloadString($OSDCatalogItem.DriverInfo) $Unicode = [System.Text.Encoding]::Unicode.GetBytes($String) $DriverInfoWebRequest = New-Object -Com 'HTMLFile' if ($DriverInfoWebRequest.IHTMLDocument2_Write) { $DriverInfoWebRequest.IHTMLDocument2_Write($Unicode) } else { $DriverInfoWebRequest.write($Unicode) } and $DriverInfoHTML = $DriverInfoWebRequest.ParsedHtml.childNodes | Where-Object {$_.nodename -eq 'HTML'} with: $DriverInfoHTML = $DriverInfoWebRequest.childNodes | Where-Object {$_.nodename -eq 'HTML'}

To Reproduce run Get-OSDCatalogIntelEthernetDriver.ps1 or one of the other commands for intel.

Expected behavior The commands run without error and all fields get filled

gwblok commented 7 months ago

Intel's site has blocked "crawlers" or the ability to read their website with PowerShell, blocking the ability to get the URLs for the Intel Catalogs. At this point, unless someone knows of a public catalog that Intel releases, I don't see any way to get this data any longer.

Note, the change happened in Nov 2023. If you go to the way back machine, you can see they could successfully craw the site in Oct, but not Nov of 2023.