bundesAPI / dip-bundestag-api

Bundestag: Dokumentations- und Informationssystem für Parlamentsmaterialien
https://dip.bundestag.api.bund.dev/
15 stars 2 forks source link

add workflow for auto sync. #17

Closed wirthual closed 1 year ago

wirthual commented 1 year ago

@maximiliancw

Initial PR of your sync action.

It will be triggered using the cron job and run every Sunday at 4:00 (Not sure which timezone, maybe UTC?)

Also, I added the option to trigger it manually pointing to another url if needed.

Instead of committing it directly, I think opening an PR is the right way since manual checking is required (e.g. the current version does not pass our linter.)

wirthual commented 1 year ago

One problem I see is since I modified the original source yaml file (See this commit) a simple file comparison would always detect a difference between the openapi.yaml files. Probably the comparison needs to be more advanced, e.g. load it as a json and compare the json files or something to overcome the string vs int errors the linter gave on the official file.

Maybe something like this can help: https://github.com/homeport/dyff

wirthual commented 1 year ago

I think one solution is to convert the yaml files into json files and then do a diff on the json files. It seems json does not have the issue with the types:

yq -Poj openapi.yaml > openapi.json and yq -Poj openapi.yaml~ > openapi.json~

and then cmp -s openapi.json openapi.json~ or just diff openapi.json openapi.json~

We would add this into the openapi-sync-action.

wirthual commented 1 year ago

I think unfortunately the problem still persists if its converted to json :-1:

maximiliancw commented 1 year ago

Why exactly did you have to manually modify the file? I didn't understand that part — what is the linter mad about?

But: I have a nifty solution! We could use the ETag and/or Last-Modified headers to check if the SwaggerUI endpoint has been updated and if so, we pull the openapi.yaml without having to check/compare the file itself. I already took a look and DIP is sending both headers. I tried to validate them by checking the current value of Last-Modified (10.03.23) against the date of the latest entry in DIP's change log (21.03.23). Assuming that they updated their docs "shortly" after updating the API itself, the data looks good.

What do you think? I'll start working on a first draft of this now.

wirthual commented 1 year ago

Hi,

the linter complained about the example values were not wrapped in " so it was seen as int instead of a string.

You can try it yourself by doing this: Install spectral.

echo "extends: spectral:oas" > .spectral.yaml
spectral lint https://search.dip.bundestag.de/api/v1/openapi.yaml

You will see errors like so:

  715:19    error  oas3-valid-schema-example  "example" property type must be string                         components.schemas.VorgangListResponse.allOf[1].properties.documents.items

Regarding your idea, I really like your approach that's indeed smart. However, I think it is more prone to needed maintenance if the page layout changes or similar.

Maybe simply adding the raw file additionally to the repo and comparing it against the source version might be easier.

I am happy to include any working solution, really up to you :)

maximiliancw commented 1 year ago

Hi @wirthual,

the linter complained about the example values were not wrapped in " so it was seen as int instead of a string.

Okay, got it.

Regarding your idea, I really like your approach that's indeed smart. However, I think it is more prone to needed maintenance if the page layout changes or similar.

Thanks! What do you mean exactly regarding maintenance? I don't think changes to the HTTP headers they're sending will happen frequently. They're probably using some (custom?) HTTP framework, which is configured to always produce these headers. They might change the URL, of course, but that would create problems for every functionality.


I created a separate repository: https://github.com/maximiliancw/dip-bundestag-api-sync-action

Unfortunately, I still cannot test my actions properly. Can you check if it works as intended? I'm cheating a bit by (mis-)using GitHub's cache action to save the latest ETag value.

wirthual commented 1 year ago

Well not sure how you extract the last date from the website but if they change the website layout this might be problematic.

Also for potential other projects there might be no change log posted on a website.

Yes I will test it 🙂

maximiliancw commented 1 year ago

Yes I will test it :-)

How did it go? I'm waiting for your feedback :)

wirthual commented 1 year ago

My bad:

If I ad

 - uses: maximiliancw/dip-bundestag-api-sync-action@main

and run it locally:

act -j my_job --platform ubuntu-latest=nektos/act-environments-ubuntu:18.04 --container-architecture linux/amd64

I get the following error:

[auto_sync.yaml/OpenAPI Sync]   🐳  docker exec cmd=[bash --noprofile --norc -e -o pipefail /var/run/act/workflow/2-composite-get_remote_etag.sh] user= workdir=
| /var/run/act/workflow/2-composite-get_remote_etag.sh: line 3: $'"765244ce"\r': command not found

Seems like the code is trying to interpret the extracted value as command.

maximiliancw commented 1 year ago

Awesome, thanks for your PR. Shall we close this one then?