[Feature] Generate Synopsis for Changed Laws

joacmue commented 3 years ago

Based on the discussions here I thought about parsing the law changes (Gesetzesänderungen) discussed in the Bundestag and generating a "human-readable" diff between the old form of the law and the proposed new one (in markdown). The exact format needs to be discussed, it should probably be somewhere in the area of the MS Word "Track Changes" (but strikethrough and text colors are not really supported by markdown) mode or some table view of "old text" vs. "proposed change".

Links of interest:

Law changes in a pretty standard format can be found on the pages of the Bundestag (look for "Drs-Suche" in the DIP)
Original Discussion on the topic on talk.lagedernation.org - read this as kind of a "User Story"

Issues I've seen for that so far:

Law names in the Änderungsgesetz texts are not (always) compatible to the names in lawdown - will need to perform some search or user query here
I'm not versed enough in css/javascript to see how to scrape recent changes from the search results
In order to match the changes in the Änderungsgesetz, I'd like to have the corresponding original law downloaded already. Issue #16 seems to adress that point, though ;)
IP issues: I'm not sure whether I am actually allowed to parse the documents publicised by the Bundestag - maybe someone around here can shed some light on the terms of use there and whether this would be covered as "fair use"

If someone is around to talk to about the legal issues, I'm willing to put in some elbow grease to get a protoype of parsing the Änderungsgesetz pdf and mapping the changed paragraphs running. Long-term goal would be to set up some web front-end that provides a drop-down of recently discussed law changes where the user can select one and then get the human-readable synopsis/change document for easy access. I'd need some guidance on the Front-End stuff here, but I'm willing to learn.

darkdragon-001 commented 3 years ago

Isn't the reason to use git such that one can use the visually appealing diff tools which git/github provide? One could use branches for the proposals 😉

joacmue commented 3 years ago

Sure, you can diff the changed laws after they changed, but what I am interested here is getting in the proposed changes that get discussed in the Bundestag. I might be on the wrong horse here given that I don’t really know the things that get published in the Bundesanzeiger and such... I like the idea of branches for proposals, though. That would actually fit the working style quite nicely. The question remains: how do you get the proposals from the change into git? The proposals I could find were not published as full texts, but rather as „diffs“. I’m not sure whether there’s a publishing channel scraper and parser for those in the tools already.

jbruechert commented 3 years ago

I could only find PDFs containing human readable descriptions of the changes, are there any better documents? Parsing those seems impossible to me.

joacmue commented 3 years ago

Yeah, the documents I found are pretty much legalese and thus not really machine-readable as well as barely human understand-able. But they do seem to follow a pretty strict syntax that might be exploited. I might give it a try over the Easter holidays. There are probably more pressing issues around here, just wanted to post the issue that got me here in some „official“ way.

ulfgebhardt commented 3 years ago

I believe an approche like it is followed by these repos would be good:

Those use a crawler and let them run every day automatically. The changes are checked in into git and therefore generate a history of changes. As stated this would only cover the laws after change.

Extracting the proposed changes would be another task which can be done in this tool. But in general the following principles should be followed:

Only extract data, don't combine stuff, calculate stuff or other. This would be subject to another tool. We want a tool that gets clean & complete data first
Extract as much data as you can, so we do not miss stuff. If you crawl an website - get all the information available on it if you can even tho it might not be relevant for your cause. It might be for someone elses.
Don't worry about legal issues. I take the responsibility if someone has a problem with us publishing bundestag/law/... related data. In the past no legal issues arose from us publishing data like we do. I tend to publish stuff under the Unlicense since the Bundestag IT department could not answer the question under what license their data is made available to the public.

joacmue commented 3 years ago

Thanks @ulfgebhardt for pointing that out. Now that I somehow understand the actual scope of the tools here, I feel a bit more like the idea sketched here should indeed move to another tool. It might still be worthwhile to add a crawler in the dip21-style here, though. I'll need to browse through those a bit to understand what's going on there. Getting all the information on the DIP database might be a bit steep, though.

ulfgebhardt commented 3 years ago

The dip21 data is scraped with this: https://github.com/bundestag/scapacra-bt

-> And don't get me wrong. I do not mind at all to do more data analysis or what not. All I say is that I consider it wise to create a solid data basis first to get the shit the official websites give us into an actual useful format. From there we can go further. The approche I describe seems logical to me: Get all data and have this process separated from the processing part. But as said - thats just an idea of mine and not set in stone.

bundestag / gesetze-tools

[Feature] Generate Synopsis for Changed Laws #23