apache / incubator-baremaps

Create custom vector tiles from OpenStreetMap and other data sources with Postgis and Java.
baremaps.apache.org
Apache License 2.0
511 stars 58 forks source link

Add allow and block lists for tags and entities #164

Open danduk82 opened 3 years ago

danduk82 commented 3 years ago

As a data integrator I want to import only the data that my style requires Because I want to reduce the costs of storage and the import time

use case I want to create a style for a basemap, that does not contain the buildings, but on a large dataset (planet OSM). I would like then to add a blacklist (in the config file?) where I list the tags related to buildings, so that in the end they are not imported in the PG database.

bchapuis commented 3 years ago

Given the size of the database for a planet import this is a very good idea. Filtering entities is indeed needed and we may also want to import or exclude some tags. Any idea on how this feature may be configured with the CLI? For instance with a file:

--allow-entities tags.txt
--block-entities tags.txt

Or with a list of arguments:

--allow-entities tag1
--allow-entities tag2

We may have something similar for tags:

--allow-tags 
--block-tags
danduk82 commented 3 years ago

I think whitelist and blacklist should be mutually exclusive. Otherwise I like the idea of having them in a dedicated file, or directly in the config.yaml file? I think if you use baremaps in something like aws-batch, it would be great that it allows to use also an http url to download the file (such as it is already done with the pbf for import). E.g --whitelist https://my-s3-bucket.s3.amazonaws.com/my-style-whitelist.yaml

bchapuis commented 3 years ago

That's a good point. More generally, this calls for tinking about the configuration files. For now, the serve and export command use a single configuration format that addresses two problems: creating the vector tiles and styling the vector tiles. This approach is not ideal in terms of separation of concerns, as a set of vector tiles may be used with different styleseheet. Adding configuration files to the import command is a good idea, but I'm worried that It may become a bit confusing for the users.

We may decide to introduce multiple configuration files to improve the separtion of concern. For instance, the import command may be modified as follow:

baremaps export \
  --database 'jdbc:postgresql://...' \
  --tileset 'tileset.yaml' \
  --stylesheet 'stylesheet.yaml'

The import command may have a transformation flag that enables to specify the allow/block lists and additional informations regarding the schema (table names, column names, etc.).

baremaps import \
  --database 'jdbc:postgresql://...' \
  --transformation 'transformation.yaml'

What do you think? What would be the most suitable name for the flag in the import command (--tranformation, --mapping, --schema, or something else).