bundesAPI / deutschland

Die wichtigsten APIs Deutschlands in einem Python Paket.
Apache License 2.0
1.15k stars 68 forks source link

Add all openapi spec apis to library #21

Open LilithWittmann opened 2 years ago

LilithWittmann commented 2 years ago

We need to find a way to create API bindings from all the openapi specs to integrate them automatically into the deutschland lib.

Any suggestions on how to tackle this?

lukaspanni commented 2 years ago

Maybe generators for client SDKs like openapi-generator or Swagger Codegen can help with that.

Regenhardt commented 2 years ago

Also every time there's a new spec, which comes with a new repo every time if I understand correctly, immediately create an issue here to link to that spec, so there's a list of specs not added yet. In case generating the APIs doesn't work too great for all the specs at once, that would also let several people take their pick and work on different specs independently.

wirthual commented 2 years ago

Hi, @lukaspanni I agree that this is the right way. I choose to go with openapi-generator since it was maintained more recently it seems (FAQ). I implemented a script to generate the APIs for bundesAPI with a script. Therefor it automatically pulls the definitions from Github and runs the generation on the yaml file. The output is an installable python package, for example this one for the smard api: Here

The code from an generated api looks like this:

.
├── docs
│   ├── DefaultApi.md
│   ├── Indices.md
│   ├── TimeSeries.md
│   └── TimeSeriesMetaData.md
├── git_push.sh
├── README.md
├── requirements.txt
├── setup.cfg
├── setup.py
├── smard
│   ├── api
│   ├── api_client.py
│   ├── apis
│   ├── configuration.py
│   ├── exceptions.py
│   ├── __init__.py
│   ├── model
│   ├── models
│   ├── model_utils.py
│   └── rest.py
├── test
│   ├── __init__.py
│   ├── test_default_api.py
│   ├── test_indices.py
│   ├── test_time_series_meta_data.py
│   └── test_time_series.py
├── test-requirements.txt
└── tox.ini

This process is quite straight forward, the tricky question is more about how we want to handle this code.

Where should it live?

Should the code for each api be next to the openapi spec? Should each api become its own package in pypi? Should it be only in the deutschland repo? Also we should consider generating code for different other languages then just python and where they should go as well. (The script is prepared to generate multiple languages, available languages from the generator are mentioned here)

I think it would be nice if we could implement this as python extras

This way we could allow installation of only the required apis: poetry add deutschland[smard,bundestag] for example. I think this would be pretty neat, since you do not need to download code you are not interested in (Maybe by default it installs all the apis with the option to choose, for simplicity.)

In the current branch I copied the generated modle in the deutschland folder and copied the test from the README. Depending on how the structure of the repo will be, also documents and the testcases could be merged into a nice structure.

These are my thoughts on getting the apis as clients. I am happy to hear what your opinion is on this.

lukaspanni commented 2 years ago

The question where the code should be stored is quite complex and has the potential to either speed up or slow down future developments. I like the approach of storing it next to the spec because it makes it easier to find the code. If everything is included in one big repository it will likely be hard to manage. A central repository with all code is possible with the use of git submodules.

Other languages can mean a lot of code to maintain but would be great for users. I think that users could also use that script to generate code for the language they want to use themselves.

The mentioned use of extras for packages would be really great from a users perspective.

wirthual commented 2 years ago

I agree, especially with the vision in mind that this repo will contain more and more apis in the future, a single repo might be harder to manage.

I do not think git submodules would be necessary since the extras could be pulled from pip or from github directly. I will look into this and do a proof of concept with one of the generated apis to store the code in the repo next to the api definition and reference it in the deutschland package.

For now I created a merge request which has the generated code and docs in the main deutschland repo since it was easier to setup. I think switching to the seperated repos should not affect the enduser later on, it is more of a change for developers.

See pull request #36

However I think we should allow users to use the generated code better earlier than later to get feedback on the apis and see if they are usable. How the code is maintained at the end should not be of concern for the enduser.

LilithWittmann commented 2 years ago

Instead of having lots of generated code in the git repos, would it be an option to generate the code only during pypy package build time? So that our library keeps in sync with the openapi specs without manual intervention?

lukaspanni commented 2 years ago

@LilithWittmann this sounds like a great idea. Repos without much generated code are much easier to maintain and understand. But I think testing the generated code would be more difficult.

wirthual commented 2 years ago

I like the idea of creating the package on the fly, test it and then package it and push it to pypi and reference it from the deutschland package as extra dependency.

This would be the github action for the generator: https://github.com/marketplace/actions/generate-client-library-w-the-openapitools-generator

Otherwise we could also adapt the generation script I wrote to output the generated code and do additional steps from there.

wirthual commented 2 years ago

Ok some updates on this topic, here is a POC with the autobahn api:

https://github.com/wirthual/autobahn-api

It generates the code on the fly in a github action and then publishes it to Pypi (here TestPypi ) for experimenting.

For testing in a local project you can add it like this for example:

poetry add https://test-files.pythonhosted.org/packages/db/28/41bfd55dea26e772f2a90bda75fa43ab74e1a6016bb0cfb93cab2d0d5ff9/autobahn-1.0.0.tar.gz

poetry run python -c "from deutschland.autobahn.apis import DefaultApi;print(DefaultApi().list_autobahnen()['roads'][:4])"

or with pip:

pip install https://test-files.pythonhosted.org/packages/db/28/41bfd55dea26e772f2a90bda75fa43ab74e1a6016bb0cfb93cab2d0d5ff9/autobahn-1.0.0.tar.gz

python -c "from deutschland.autobahn.apis import DefaultApi;print(DefaultApi().list_autobahnen()['roads'][:4])"

This should return a list of autobahnen: [A1, A2, A3, A4]

As you can see, the package is in the namespace deutschland.

I started an experiment here with adding this generated package as an extra but did not get it working yet :( Happy for some support / evaluation if this is something we want to pursue or as a first step install all generated subpackages by default.

Currently the generated code and docs are not commited into the repo on generation, but this could be added simply with another github action (For debugging reasons I think it would be good for devs to have the code available on github and also the documentation?)

lukaspanni commented 2 years ago

That's great!

Currently the generated code and docs are not commited into the repo on generation, but this could be added simply with another github action (For debugging reasons I think it would be good for devs to have the code available on github and also the documentation?)

I think it would be great to have the code in a repo. But in my opinion the documentation is even more important. Then it would be possible to extract the documentation and include it in the sphinx documentation. I'm currently working on the sphinx documentation (but I have a lot of other stuff to do). I will look into using that autogenerated stuff in sphinx.

wirthual commented 2 years ago

Ok code now also gets added to the repo on the generation process (example here)

The generated docs from the openapi-generator are basically markdown files with references to each other. The generated example doc you can find in the docs subfolder.

lukaspanni commented 2 years ago

I looked at the code and realized that we probably dont need the markdown files. Instead we can use sphinx autodoc with napoleon to build documentation from the docstrings of the generated code.

@wirthual I opened a PR to your repository where I show an example for extracting documentation from docstrings. IMO this solution is a viable option for documenting the whole deutschland project.

wirthual commented 2 years ago

Updates for our fellow readers:

@lukaspanni and me we are working on a github action which takes care of generation of code, uploading to pypi and genreation of doc. Experimental version can be found here. The plan is the end is that each api repo can simply add the github action to the workflow and the rest should be magic :)

Happy to get some input in form feedback/improvements.

lukaspanni commented 2 years ago

@wirthual I created a template repository containing the action-code and the required python scripts

wirthual commented 2 years ago

Nice 👍 Thank you.

The plan is to create a custom GitHub action we can simply add into all the repos containing the open-api files.I will look into this.

Currently a challenge is the namespace package. I need to understand how to integrate the extra packages in the deutschland namespace.

For my example project I could not figure out how to do this correctly yet.

wirthual commented 2 years ago

I created a github action which consists of the composite steps to perform linting, code generation, documentation generation and upload to pypi here: https://github.com/wirthual/deutschland-generator-action

@lukaspanni Created a template for upcoming repos which makes the workflow much easier.

An example of the generated code from the action is here: https://github.com/wirthual/autobahn-api

This way we can avoid code duplication and have the code living on one central place.

Open Points:

lukaspanni commented 2 years ago

I will also update the template repo to use the new action

LilithWittmann commented 2 years ago

Maybe we should leave linting and openapi checks as a separate step to generating the API clients?

wirthual commented 2 years ago

I agree. Its a separate workflow now.

lukaspanni commented 2 years ago

I had a lot of other stuff to do in the last days, but I will try to get the automatic documentation working for the whole project this week.

Edit: finished my PR

Having all code in one repository is acutally quite confusing to work with because of the structure of the autogenerated code. But IMO its still the preferred way to go forward, maybe the repo-structure has to be altered a little.

wirthual commented 2 years ago

Going forward i think the code for each api will live in its own repo anyways. So maybe we should have the documentation also separated and one main documentation which links to all the others as a starting point for developers?

Is the plan to host the docs on github pages?

lukaspanni commented 2 years ago

Maybe we could have a repo that includes the repos from each api as submodule. This would be a compromise combining the two approaches. Someone working on only on repo could just clone that while someone else who needs/wants the whole code can still get everything quite easily. But I guess it would require a lot of manual work to get this structure working.

Regarding the documentation: I don't like the idea of one huge documentation. Linking to sub-documentation is the way better approach to allow more flexibility in the doc-creation. So not every repo would have to use the autogeneration as it tends to not work for all circumstances.

wirthual commented 2 years ago

@lukaspanni I streamlined the generation process, now only one generation_config.yaml and the github workflow is needed in the initial repo. Have a look at the autobahn repo for example: https://github.com/bundesAPI/autobahn-api

The only things to change are in the config, the name, the urls and the version.

Do you think you can add those changes to your template? And then maybe merge them with this template and create a pull request: https://github.com/bundesAPI/api-doc-template

That would be awesome to have the next API make use of this stuff :)

Also, do we know what we want to do with the docs? I agree it should be divided in the submodules, same as with the code.

lukaspanni commented 2 years ago

@wirthual great work! The repository is so much cleaner now. I have only one minor complaint: local tests of code and/or doc generation are now more complicated. (I use act to make this easier)

I updated my own template at and created a PR for the bundesAPI template. Once the PR is merged, I will archive my own template since both templates do the same thing.

If the code is divided, there is no reason to have a monolithic documentation, so division is the way to go.

@wirthual I also filed an issue for your generator action at https://github.com/wirthual/deutschland-generator-action/issues/2

wirthual commented 2 years ago

What exactly do you mean with more complicated?

I also tried act but I could not get it to work, even with the huge docker image I downloaded.

lukaspanni commented 2 years ago

It is more complicated because the scripts for code and doc generation are not in the repo but inside the action. So you cannot run them with a simple command. For me act worked fine.