digitaldutch / BAG_parser

Turns Dutch addresses database (BAG or Basisregistratie Adressen en Gebouwen) into a user friendly SQLite database.
MIT License
24 stars 5 forks source link
addresses bag kadaster netherlands python sqlite

GitHub license GitHub issues

Digital Dutch BAG parser

TL;DR

Converts in a few minutes the big, complex and hard to read XML Dutch addresses database (BAG or Basisregistratie Adressen en Gebouwen) into a user-friendly, file based, blazingly fast SQLite database by running a single Python script. No need to install any dependencies or a database server.

Additional scripts will convert this SQLite database to other formats, like CSV.

Download the parsed BAG

If you don't want to run the script yourself, download the latest BAG in SQLite or CSV format from our releases section.

About the BAG

The Dutch public addresses and buildings database (BAG or Basisregistratie Adressen en Gebouwen) is freely downloadable from the Dutch cadastre agency named Kadaster. Hooray 🙂.

The bad news is: The original BAG comes in a complex and hard to read XML format using thousands of zipped XML files, which will quickly reduce your initial enthusiasm. It also does not include municipalities or provinces and provides coordinates using a system that non-experts won't expect named Rijksdriehoekscoördinaten😲.

What this parser does

This Python utility parses the BAG database and converts it into a clean, easy to read & use SQLite database. Municipalities (gemeenten) and provinces (provincies) are added. Rijksdriehoekscoördinaten coordinates are converted to standard WGS84 latitude and longitude coordinates. Invalid (dummy) bouwjaar and oppervlakte fields are removed. Year of construction, floor area and intended use of buildings are also provided. Several tables (nummers, verblijfsobjecten, panden, ligplaatsen and standplaatsen) are merged into a general 'adressen' table. The SQLite database can be used directly, as a source to generate a *.csv file or to update your own addresses databases. There are a couple of options available in the config.py.

Requirements

Usage

Python commands

import_bag.py

Parses the original BAG file and transforms it into a SQLite database. Takes about 10 minutes to complete on an AMD 7700X PC, or a few minutes more if you switch on the parse_geometries option in the config.py.

export_to_csv.py

Exports the addresses in SQLite database to a *.csv file. By default, only the addresses and postcode data is exported (~15 seconds). Use the command options below for more output formats.

-a, --all
Export all data including year of construction, latitude, longitude, floor area and intended use of buildings. ~40s

-h, --help
show help message

-p4, --postcode4
Export statistics of 4 character postal code groups. (e.g. 1000). ~10s

-p5, --postcode5
Export statistics of 5 character postal code groups (e.g. 1000A). ~10s

-p6, --postcode6
Export statistics of 6 character postal code groups (e.g. 1000AA). ~10s

test_sqlite_db.py

Checks de SQLite database for info and errors. import_bag.py also performs these tests after parsing.

utils_sqlite_shrink.py

Reduces the SQlite database size by removing BAG tables (nummers, verblijfsobjecten, panden, ligplaatsen and standplaatsen) that are no longer needed due to the new 'adressen' table. The parser also does this as a final step if delete_no_longer_needed_bag_tables is set to True in config.py.

Adressen table

An adres is a nevenadres if the hoofd_nummer_id field is set. It points to the nummer_id of the hoofdadres.

Limitations and notes

Documents

Tools

Official BAG viewer

The Kadaster has an online BAG viewer where you can search any address or other info in the official database.

nlextract

This tool does not parse all data. If you need more data or professional support, buy it from nlextract, who have a more complex, but also complete parser.

bagconv

Bert hubert has written a parser in C++, bagconv, which is quite similar to this one.

License

This software is made available under the MIT license.