bundestag / gesetze-tools

Scripts to maintain German law git repository
GNU Lesser General Public License v3.0
114 stars 21 forks source link

🚀 [Feature] Separate Data from Tool #36

Open ulfgebhardt opened 3 years ago

ulfgebhardt commented 3 years ago

:rocket: Feature

It is common practice that scraper and data is stored separately, but here this is not the case - or at least partly.

We have a data folder containing jsons: https://github.com/bundestag/gesetze-tools/tree/master/data

But there is a repo associated with this scraper as well: https://github.com/bundestag/gesetze

It is still unclear to me how the tool produces the output stored in the gesetze repo.

Nevertheless I consider it useful to have all data separated from the tools creating them. I think it would be wise to create a new repo for the scraped data (please in English)

Design & Layout

Data in a data-repo should be stored in a data folder

image

darkdragon-001 commented 3 years ago

There should be a README.md always IMHO.

I suggest separate repositories for separate data sets (bgbl, banz, ...).

ulfgebhardt commented 3 years ago

The repos should have proper naming - "banz" has no meaning at all. Event tho I say have english names "Bundesanzeiger" as Entity-name is acceptable I guess and the reader understands what the repo is about

mk-pmb commented 12 months ago

We may not even need a separate repo. Using a separate branches would probably already cover most of the way. Then we should have a cleaned-up version of the tools branch that omits all the data commits, so that the tools themselves are quick to clone.