bundestag / gesetze-tools

Scripts to maintain German law git repository
GNU Lesser General Public License v3.0
113 stars 21 forks source link

[Feature] Generate Synopsis for Changed Laws #23

Closed joacmue closed 3 years ago

joacmue commented 3 years ago

Based on the discussions here I thought about parsing the law changes (Gesetzesänderungen) discussed in the Bundestag and generating a "human-readable" diff between the old form of the law and the proposed new one (in markdown). The exact format needs to be discussed, it should probably be somewhere in the area of the MS Word "Track Changes" (but strikethrough and text colors are not really supported by markdown) mode or some table view of "old text" vs. "proposed change".

Links of interest:

Issues I've seen for that so far:

If someone is around to talk to about the legal issues, I'm willing to put in some elbow grease to get a protoype of parsing the Änderungsgesetz pdf and mapping the changed paragraphs running. Long-term goal would be to set up some web front-end that provides a drop-down of recently discussed law changes where the user can select one and then get the human-readable synopsis/change document for easy access. I'd need some guidance on the Front-End stuff here, but I'm willing to learn.

darkdragon-001 commented 3 years ago

Isn't the reason to use git such that one can use the visually appealing diff tools which git/github provide? One could use branches for the proposals 😉

joacmue commented 3 years ago

Sure, you can diff the changed laws after they changed, but what I am interested here is getting in the proposed changes that get discussed in the Bundestag. I might be on the wrong horse here given that I don’t really know the things that get published in the Bundesanzeiger and such... I like the idea of branches for proposals, though. That would actually fit the working style quite nicely. The question remains: how do you get the proposals from the change into git? The proposals I could find were not published as full texts, but rather as „diffs“. I’m not sure whether there’s a publishing channel scraper and parser for those in the tools already.

jbruechert commented 3 years ago

I could only find PDFs containing human readable descriptions of the changes, are there any better documents? Parsing those seems impossible to me.

joacmue commented 3 years ago

Yeah, the documents I found are pretty much legalese and thus not really machine-readable as well as barely human understand-able. But they do seem to follow a pretty strict syntax that might be exploited. I might give it a try over the Easter holidays. There are probably more pressing issues around here, just wanted to post the issue that got me here in some „official“ way.

ulfgebhardt commented 3 years ago

I believe an approche like it is followed by these repos would be good:

Those use a crawler and let them run every day automatically. The changes are checked in into git and therefore generate a history of changes. As stated this would only cover the laws after change.

Extracting the proposed changes would be another task which can be done in this tool. But in general the following principles should be followed:

joacmue commented 3 years ago

Thanks @ulfgebhardt for pointing that out. Now that I somehow understand the actual scope of the tools here, I feel a bit more like the idea sketched here should indeed move to another tool. It might still be worthwhile to add a crawler in the dip21-style here, though. I'll need to browse through those a bit to understand what's going on there. Getting all the information on the DIP database might be a bit steep, though.

ulfgebhardt commented 3 years ago

The dip21 data is scraped with this: https://github.com/bundestag/scapacra-bt

-> And don't get me wrong. I do not mind at all to do more data analysis or what not. All I say is that I consider it wise to create a solid data basis first to get the shit the official websites give us into an actual useful format. From there we can go further. The approche I describe seems logical to me: Get all data and have this process separated from the processing part. But as said - thats just an idea of mine and not set in stone.