bundesAPI / deutschland

Die wichtigsten APIs Deutschlands in einem Python Paket.
Apache License 2.0
1.23k stars 67 forks source link

bundesanzeiger: allow fetching multiple pages for one company #143

Closed jfhr closed 5 months ago

jfhr commented 6 months ago

The current Bundesanzeiger implementation only fetches one page of results with up to 20 reports, but sometimes it might be interesting to get older reports as well.

This adds a named parameter page_limit to Bundesanzeiger.get_reports. The default value is 1, which preserves the current behavior of fetching only one page. If a higher value is set, the client will search the returned HTML for a "next page" link, and keep generating reports until page_limits pages have been parsed or there is no "next page" link anymore. float('inf') can be passed to fetch all available pages.

This commit adds a unit test for the method to find the "next page" link and another to test that it actually generates more than 20 reports.

This also encodes the company name in the URL so that search terms like "Saxony Minerals & Exploration - SME AG" work correctly.

wirthual commented 6 months ago

Hi,

thanks for the contribution. Can you black-format your code changes?

How do you execute your tests? It seems the execution in the CI does not find response.html. Probably because the working dir is different.

Maybe we can use importlib to find the file independent of the execution path?

jfhr commented 6 months ago

Hey, thanks for the feedback. I ran

black ./src
black ./tests

I also used os.path.dirname(__file__) in the test code to get the correct path to response.html. As far as I can tell the tests work now, but some tests in other packages are failing.

I'm not sure how importlib would differ from __file__ (not really a python developer).

wirthual commented 5 months ago

In some exotic cases, __file__ might not be populated. But in a test thats fine. Thank you for the changes.