HtmlUnit / htmlunit

HtmlUnit is a "GUI-Less browser for Java programs".
https://www.htmlunit.org
Apache License 2.0
873 stars 172 forks source link

Htmlpage.getPage() returns empty webpage #516

Open asmodejan opened 2 years ago

asmodejan commented 2 years ago

I'm getting a blank page when I try to access the following URL with HtmlUnit

https://register.dpma.de

It's the Official Publications and Register Database of the German Patent and Trademark Office. No such problems while trying to access the central Homepage https://www.dpma.de or the online patent publication search database https://depatisnet.dpma.de.

It makes no difference if JavaScript is enabled or not. I guess the JavaScript code running on the site is erroneous or is currently maybe not supported in HtmlUnit?

Any pointers would be much appreciated.

rbri commented 2 years ago

It makes no difference if JavaScript is enabled or not. I guess the JavaScript code running on the site is erroneous or is currently maybe not supported in HtmlUnit?

At least the page makes a redirect to https://register.dpma.de/DPMAregister/Uebersicht and this page seems to make some magic page protection stuff. Will have a deeper look but no idea if i can get this working.

rbri commented 2 years ago

looks like you are facing F5 Network’s Javascript detection - they like to protect pages from bots ....