bundestag / gesetze-tools

Scripts to maintain German law git repository
GNU Lesser General Public License v3.0
114 stars 21 forks source link

🔧 [Refactor] Use api.offenegesetze.de for BGBl #40

Open darkdragon-001 opened 3 years ago

darkdragon-001 commented 3 years ago

:zap: Refactor ticket

Use api.offenegesetze.de instead of BGBl scraper.

Motive

Stable API instead of scraping a changing website.

stefanw commented 3 years ago

This is generally a good idea. Just for your understanding that this new 'stable API' by OffeneGesetze.de is based on the same code as the one in this repo – so also scrapes a changing website. But that's how we do Open Data in Germany. 😬

darkdragon-001 commented 3 years ago

@stefanw Where exactly is the scraper code located? offenegesetze.de or api.offenegesetze.de or something else?

At least I think that maintaining one scraper should be enough for this Github organization^^

Does anyone have some insights which laws are only published on Bundesanzeiger and not on Bundesgesetzblatt? Is the Bundesanzeiger scraper still necessary?

stefanw commented 3 years ago

I believe the API is still powered by this little repo: https://github.com/stefanw/bgbl

However, starting soon (latest January 2022) the whole BGBl process will change (see update at the top of the post).

darkdragon-001 commented 3 years ago

I believe the API is still powered by this little repo: https://github.com/stefanw/bgbl

:open_mouth: why did I fix the bgbl scraper in this repo some weeks ago (in #20 and #31) when you already fixed it before... We should definitely collaborate better and make sure we don't do the same work twice...

What do you think of moving your repository into this organization?

We are also discussing in #36 to split this repository up a bit. What are your ideas on this?

However, starting soon (latest January 2022) the whole BGBl process will change (see update at the top of the post).

Interesting and kudos for this success!

Where do you store the results and feed the api from?

Are there any plans to include publications from Bundesanzeiger on the offenegesetze.de website and API?

ulfgebhardt commented 3 years ago

I would vote against a dependency on https://offenegesetze.de/daten that just seems to be scraping with extra steps. By putting another layer in between things break more easily.

I believe the best course of action would be to maintain a common & public scraper, which is used by all parties involved.

And furthermore I find it quite weird that you do not maintain that stuff in this repo @stefanw - thats what it was made for??!

stefanw commented 3 years ago

We should definitely collaborate better and make sure we don't do the same work twice...

Sorry, I do not follow the work here too closely, so I missed the PRs.

What do you think of moving your repository into this organization?

I should definitely move it from my personal account, but maybe I should integrate it into api.offenegesetze.de? Right now it works for the purposes of that API and should only receive fixes when it breaks.

Where do you store the results and feed the api from?

@okfde runs a server with the offenegesetze infrastructure. (Or am I misunderstanding your question?)

Are there any plans to include publications from Bundesanzeiger on the offenegesetze.de website and API?

There were some loose plans for other law gazettes, but Bundesanzeiger (only Amtlicher Teil?) is likely better covered elsewhere.

darkdragon-001 commented 3 years ago

What do you think of moving your repository into this organization?

I should definitely move it from my personal account, but maybe I should integrate it into api.offenegesetze.de? Right now it works for the purposes of that API and should only receive fixes when it breaks.

I would prefer not to maintain two scrapers of the same data. What is the output of your scraper? Maybe we can agree on a common format. What concerns this repository, we are pretty open on the format. I guess you already split the data based on years or months and run it regularly instead of the huge single json file output by this repository's scraper which we want to get rid of anyways...

Where do you store the results and feed the api from?

@okfde runs a server with the offenegesetze infrastructure. (Or am I misunderstanding your question?)

How is it stored? Folder of json's? Database? Something else?

Are there any plans to include publications from Bundesanzeiger on the offenegesetze.de website and API?

There were some loose plans for other law gazettes, but Bundesanzeiger (only Amtlicher Teil?) is likely better covered elsewhere.

Yes, from what I see, we only need Amtlicher Teil. It is still mentioned quite often in the sources from gesetze-im-internet.de...