Open ulfgebhardt opened 3 years ago
Who maintains bundestag/gesetze? Who has pull/merge rights?
There are lots of open pull requests which haven't been merged yet. One should first get the manual workflow running before trying to automate things.
Most of the pull requests are either jokes, drafts or too large to review. Generating an up to date version from source is probably a better course of action.
I sorta start taking the responsibility since people come to me and ask for the Repo. Tho I have nothing to do with it. My course of action is finding people who wanna do it. I have all the rights needed and can also propagate those rights. I invite people to the orga if they have a commit on a repo in the Orga or a featured fork. This should allow you to have more rights - not sure the merge right is set tho.
So if you wanna do the automatic push thingy, we can certainly make that happen rightwise.
Anyone has an idea how to efficiently determine the changed laws since the last run?
While it is easy for the scrapers (BGBl, BAnz, ...) since they are ordered by date, it is not so easy for the laws.
There is Aktualitätendienst which can be mapped to the corresponding entries in the scraped data based on page number, but I don't see how this can determine which laws (name
or slug
) actually changed. Anyone has an idea?
I am wondering if it makes sense to use https://github.com/actions/cache for storing the json data instead of committing it to some repo as it is fully generated. @ulfgebhardt do you have an opinion here?
I believe that it is worthwhile to store all data in a repo - that way we would make the changes of laws transparent and searchable.
Why would we hide the actual content in some volatile cache? I do not really understand the benefits. Furthermore the actual content we provide is the scraped data - we should ensure maximum visibility and transparency.
But thats all just an opinion ;)
I don't like the fact that tooling and data is mixed in this repository. Also using and updating the cache seems just easier. I also don't see any added benefit by storing this data as it is fully reproducible and verifiable by anyone. No strong objection, just my personal opinion.
Tooling happens here: https://github.com/bundestag/gesetze-tools Data happens here: https://github.com/bundestag/gesetze
The data is not reproducable since the official websites do not provide a history, do they?
I am talking about the intermediate JSON files stored in https://github.com/bundestag/gesetze-tools/tree/master/data. I agree that the final Markdown files should be committed via Git to the other repository.
Ok then I missunderstood
Hi! Sorry for being late to the party.
I am wondering if it makes sense to use https://github.com/actions/cache for storing the json data
Don't cache, always publish. If the data helps for our next automated run, usually it will also help humans with their next manually-invoked run. For data where git can make meaningful useful diffs, pushing it to a repo is a good idea. For all other stuff, let's instead make it part of a "release" = GitHub-hostet blob download.
I don't like the fact that tooling and data is mixed in this repository.
Yes, we should strictly separate both.
I had a quick look at gesetze-tools
and see several python scripts.
I assume they need to run in a temporary clone of the gesetze
repo, right?
From the readme I see lawde.py
and lawdown.py
have to run chained. Can the others run in parallel, each in their own gesetze
clone (probably with working directory set to repo root?), or do some of them depend on another's results?
Will some of them conflict when run in parallel but using the same (shared) gesetze
clone?
What files do I need to collect and publish from which of the tools?
Edit: Moved to #36
Also it would be nice to have a small dummy version of the data repo, with all important structures at the latest version but much faster to clone. Or can I just pick an ancient commit? My hope is to make quick test runs for debugging that will probably produce wrong results but can give a preview of whether it would have worked when using the real data repo.
:rocket: Feature
Implement github workflow to publish data daily
Please help implement it - if you have the free time to do it we would solve a 3 year old problem which pops up every election year. Pinging capable and potentially interested people out of the blue: @Muehe @JBBgameich <3
User Problem
We would have plain text data here in github
https://github.com/bundestag/gesetze/issues/55
Implementation
Use github workflows. See examples:
https://github.com/Ocelot-Social-Community/Ocelot-Social/blob/master/.github/workflows/publish.yml https://github.com/gradido/gradido/blob/master/.github/workflows/publish.yml https://github.com/mattia-lerario/Mentor-Application-Bachelor-Project/blob/master/.github/workflows/test.yml#L23
Additional context
src