bundesAPI / deutschland

Die wichtigsten APIs Deutschlands in einem Python Paket.
Apache License 2.0
1.14k stars 67 forks source link

Handelsregister demo code returns error #37

Open auchtetraborat opened 2 years ago

auchtetraborat commented 2 years ago

Running the demo code in the README.md for the Handelsregister module returns an error.

How to reproduce:

>>> from deutschland import Bundesanzeiger
>>> from deutschland import Handelsregister
>>> hr = Handelsregister()
>>> hr.search(keywords="Deutsche Bahn Aktiengesellschaft")

Expected result:

What I got instead:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 138, in search
    return self.search_with_raw_params(params)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 215, in search_with_raw_params
    return self.__find_entries(soup)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 242, in __find_entries
    data = self.__extract_history(tr)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 276, in __extract_history
    [position, historical_name] = tds[1].text.strip().split(".) ", 1)
ValueError: not enough values to unpack (expected 2, got 1)

My env:

auchtetraborat commented 2 years ago

The error appears to be caused by the websites data being in a different format then expected.

[position, historical_name] = tds[1].text.strip().split(".) ", 1)

tds[1] usually looks something like: "1.) xyz GmbH" Before the error occurs, tds[1] is "-Deutsche Kreditbank Aktiengesellschaft Niederlassung Berlin ", containing no '.)'.

A similar error will appear in the next line:

historical_location = tds[2].text.strip().split(".) ", 1)[1]

Usually: " 1.) Bad Elster " New format: "Berlin "

The data can also be viewed here: https://www.handelsregister.de/rp_web/result.do?Page=4

Unfortunatly, I dont know enough about the Handelsregister module's logic to proberly add input handling for this format.

kiranmusze commented 2 years ago

I found the bug. It is in the method def __extract_history(self, row): in line 203 in deutschland/handelsregister/registrations.py

Replace these lines:

[position, historical_name] = tds[1].text.strip().split(".) ", 1)
historical_location = tds[2].text.strip().split(".) ", 1)[1]

with these ones:

position = tds[1].text.strip().split(".) ", 1)[0]
historical_name = tds[1].text.strip().split(".) ", 1)[1:]
historical_location = tds[2].text.strip().split(".) ", 1)[1:]

Now it should be working. Hope this helps.