bundesAPI / handelsregister

295 stars 33 forks source link

Kostenfreie abfragen ab 01.08.2022 #4

Open wirthual opened 2 years ago

wirthual commented 2 years ago

As per their website:

With the coming into effect of the Law on the Implementation of the Digitalization Guidelines (DiRUG) on 01.08.2022, access to all register content in the trade, cooperative, association and partnership register as well as to any electronically available documets through the Common Register Portal of federal states is provided free of charge starting from 01.08.2022. After that date, no registration and no log-in is required any more.

Do they offer an API description? 😅

alper commented 1 year ago

I have querying working using MechanicalSoup now. Took a bit of prodding with their weird javascript form.

LilithWittmann commented 1 year ago

@alper sounds awesome. Maybe would be cool if you could document it?

alper commented 1 year ago

Got the stub in #7.

Next up grab all the relevant belonging to a specific company and see if the people can be parsed out?

alper commented 1 year ago

(Used mechanize after all because it's pretty solid and familiar once it works.)

alper commented 1 year ago

Maybe going to use Selenium after all because this is the post payload for getting one of the documents:

ergebnissForm=ergebnissForm&javax.faces.ViewState=-8635335262319402326%3A6636106239244724446&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade=ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade

and its triggered in javascript by this Jakarta Server Faces application.

alper commented 1 year ago

I still have to try it out. It could be that this thing has a <noscript> fallback.

alper commented 1 year ago

I ported it to Selenium and can download the PDF files now. Will polish it and make sure you can for a given company get all the PDFs.

deeprobin commented 1 year ago

@alper Does it also work without active JavaScript? Can you provide your source code?

alper commented 1 year ago

I'll post it after one more iteration.

It seems nothing here works without javascript.

alper commented 1 year ago

Is this still necessary? I got it to work in headless and download all the straightforward documents for an entity.

CleanShot 2022-08-20 at 13 26 05@2x

This can be cleaned up, documents moved into a permanent location and run in batch but Selenium/Gecokdriver is kinda unreliable.

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

tillewolle commented 1 year ago

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

How to run it in headless mode? The readme in your fork only describes how to use the regular handelsregister.py, not the sel.py

alper commented 1 year ago

It should already be headless like this: https://github.com/alper/handelsregister/blob/main/sel.py#L37

tillewolle commented 1 year ago

I thought I would be able to download .pdf files with the sel.py but I find no information about how to download them.

alper commented 1 year ago

I think it does but I haven't used it for a while and it's grossly untested. It definitely won't work to just get a bunch of PDFs without a lot of handling.

timtensor commented 1 year ago

Hi @alper , I was trying to run sel.py in colab . I seem to get an error in the following line https://github.com/alper/handelsregister/blob/e6cea7d92041e4a28c323ea390c9bdb5bbab7a1d/sel.py#L65 and the error trace is as follows Do you know what could be wrong here?

Registerportal | Advanced search
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="3b7448c5-f072-4e11-af93-25c85be00d4d")>
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="105dcd63-80e1-4bf5-9a25-1cedbd5785e7")>
---------------------------------------------------------------------------
ElementClickInterceptedException          Traceback (most recent call last)
<ipython-input-43-c212176fe302> in <cell line: 21>()
     19 search_button = driver.find_element(By.XPATH, "//button[@id='form:btnSuche']")
     20 print(search_button)
---> 21 print(search_button.click())
     22 #document_list = ['AD','CD','HD',# 'DK',# 'UT'# 'VÖ','SI']

3 frames
/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    243                 alert_text = value["alert"].get("text")
    244             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>