bundesAPI / handelsregister

303 stars 38 forks source link

mechanize._mechanize.FormNotFoundError: no form matching name 'form' #28

Open schenkd opened 1 month ago

schenkd commented 1 month ago

Hey,

I've tried the example from the README.md and installed the app as described. Unfortunatley it fails because the HTML docs has no HTML-Form named form.

send: b'GET / HTTP/1.1\r\nAccept-Encoding: gzip\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15\r\nAccept-Language: en-GB,en;q=0.9\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nConnection: close\r\nHost: www.handelsregister.de\r\n\r\n'
reply: 'HTTP/1.1 302 Found\r\n'
header: Date: Sun, 14 Jul 2024 09:27:36 GMT
header: Server: Apache
header: Strict-Transport-Security: max-age=31536000; preload
header: Referrer-Policy: origin-when-cross-origin
header: Location: https://www.handelsregister.de/rp_web/welcome.xhtml
header: Cache-Control: max-age=15
header: Expires: Sun, 14 Jul 2024 09:27:51 GMT
header: Content-Length: 235
header: Connection: close
header: Content-Type: text/html; charset=iso-8859-1
b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>302 Found</title>\n</head><body>\n<h1>Found</h1>\n<p>The document has moved <a href="https://www.handelsregister.de/rp_web/welcome.xhtml">here</a>.</p>\n</body></html>\n'
*****************************************************
send: b'GET /rp_web/welcome.xhtml HTTP/1.1\r\nAccept-Encoding: gzip\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15\r\nAccept-Language: en-GB,en;q=0.9\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nConnection: close\r\nHost: www.handelsregister.de\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sun, 14 Jul 2024 09:27:37 GMT
header: Server: Apache-Coyote/1.1
header: Strict-Transport-Security: max-age=31536000; preload
header: Referrer-Policy: origin-when-cross-origin
header: Referrer-Policy: origin-when-cross-origin
header: Expires: Thu, 01 Jan 1970 01:00:00 GMT
header: Pragma: no-cache
header: Expires: Tue, 08 Aug 2006 10:00:00 GMT
header: Content-Type: text/html;charset=UTF-8
header: Content-Length: 41571
header: Vary: Accept-Encoding
header: X-Content-Type-Options: nosniff
header: Cache-Control: must-revalidate, proxy-revalidate, no-store, no-cache, s-max-age=0, max-age=0
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Permitted-Cross-Domain-Policies: master-only
header: Set-Cookie: JSESSIONID=53FBAE13BD34A6F4DF932C52BC83C2F5.tc05n02; Path=/; HttpOnly
header: Connection: close
b'<!DOCTYPE html>...too much for github...</html>'
*****************************************************
send: b'GET /rp_web/welcome.xhtml HTTP/1.1\r\nAccept-Encoding: gzip\r\nReferer: https://www.handelsregister.de\r\nCookie: JSESSIONID=53FBAE13BD34A6F4DF932C52BC83C2F5.tc05n02\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15\r\nAccept-Language: en-GB,en;q=0.9\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nConnection: close\r\nHost: www.handelsregister.de\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Sun, 14 Jul 2024 09:27:37 GMT
header: Server: Apache-Coyote/1.1
header: Strict-Transport-Security: max-age=31536000; preload
header: Referrer-Policy: origin-when-cross-origin
header: Referrer-Policy: origin-when-cross-origin
header: Expires: Thu, 01 Jan 1970 01:00:00 GMT
header: Pragma: no-cache
header: Expires: Tue, 08 Aug 2006 10:00:00 GMT
header: Content-Type: text/html;charset=UTF-8
header: Content-Length: 40427
header: Vary: Accept-Encoding
header: X-Content-Type-Options: nosniff
header: Cache-Control: must-revalidate, proxy-revalidate, no-store, no-cache, s-max-age=0, max-age=0
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Permitted-Cross-Domain-Policies: master-only
header: Connection: close
b'<!DOCTYPE html>...too much for github...</html>'
*****************************************************
Registerportal | Homepage
Traceback (most recent call last):
  File "/Users/DASchenk/Projects/private/handelsregister/handelsregister.py", line 185, in <module>
    companies = h.search_company()
                ^^^^^^^^^^^^^^^^^^
  File "/Users/DASchenk/Projects/private/handelsregister/handelsregister.py", line 76, in search_company
    self.browser.select_form(name="form")
  File "/Users/DASchenk/Projects/private/handelsregister/.venv/lib/python3.11/site-packages/mechanize/_mechanize.py", line 681, in select_form
    raise FormNotFoundError("no form matching " + description)
mechanize._mechanize.FormNotFoundError: no form matching name 'form'

What I discovered so far is that follow_link (on line 72) with "Advanced search" is responding with the main page and not with the "rendered" response of "Erweiterte Suche".

response_search = self.browser.follow_link(text="Advanced search")

Link-element with title "Advanced search" on main page, that is included in the response:

<a tabindex="-1" title="Advanced search" class="ui-menuitem-link ui-corner-all  rpNavMainMenuItem" href="#" onclick="PF(\'sidebar1\').hide();PrimeFaces.ab({s:&quot;j_idt27&quot;,f:&quot;headerForm&quot;});return false;">
  <span class="ui-menuitem-icon ui-icon fa fa-search-plus" aria-hidden="true"></span>
  <span class="ui-menuitem-text">Advanced search</span>
</a>

Could someone with more knowleadge of this lib check if this was working with the current website of Handelsregister or is there a major change that breaks the lib? I'm glad to help solve this issue, but before I try to overcome the JQuery rendering with Selenium I'd like to know if mechanize was before able to handle this behaviour?

Greetings David

schenkd commented 1 month ago

On my research I'm not seeing that meachnize would be able to handle those async jquery requests in combination of re-rendering parts of the website. Let me know if I miss something here.

Beside of that I re-wrote the part of the Handelsregister Python SDK to work with Selenium to overcome this issue. If wished I can open a PR.

Best David

kanadagermane commented 2 days ago

Same issue over here! Can you share your selenium implementation @schenkd?