cityjson / specs

Specifications for CityJSON, a JSON-based encoding for 3D city models
https://cityjson.org
Creative Commons Zero v1.0 Universal
107 stars 25 forks source link

Promote CityJSONFeatues for file storage #121

Closed balazsdukai closed 2 years ago

balazsdukai commented 2 years ago

This is a proposal to promote the use of CityJSONFeatues for file storage.

Get the data

Download a tile from https://3d.kadaster.nl/basisvoorziening-3d/. This experiment uses the tile 68dn2, which contains a good variety of land uses.

Screenshot from 2022-08-04 17-16-05

The tile is split into four sub-tiles. Each file is upgraded to CityJSON v1.1 and exported as CityJSON Lines with cjio 0.7.4.

File sizes

The file size comparison is shown in the table below. As you can see, converting a regular CityJSON file to CityJSON Lines leads to 16-17% reduction in the file size, probably due to the smaller indices in the boundary arrays.

Tile CityJSON [Mb] CityJSON Lines [Mb] Size reduction
68dn2_01 268.4 226.4 16%
68dn2_02 312.8 261.9 16%
68dn2_03 370.0 307.9 17%
68dn2_04 364.4 301.1 17%

Loading the files

In this section I compared the speed and memory footprint of looping through each CityObject in a data set. This operation is common in applications that manipulate a whole city model. Basically almost all of cjio's operations.

However, the tests below are not relevant for CityJSON libraries that manipulate individual CityObject, because they typically want to store the whole city model in memory anyway.

When a city model is stored in its entirety in a CityJSON object, we need to load the whole CityJSON object into memory in order to access the transform and vertices objects for instance.

# load_cityjson.py

import json
from sys import argv

fpath = argv[1]

with open(fpath, "r") as fo:
    cm = json.load(fo)
    for coid, co in cm["CityObjects"].items():
        # process the CityObject with information from cm['transform']
        pass

With CityJSONFeatures we can read the file line by line, processing and discarding the CityObjects one by one. This allows a very efficient operation in terms of both CPU and memory usage, provided that the first object in the file is the CityJSON object that contains the metadata and transform property that is required for parsing the CityObjects. Operations that would highly benefit from this are subsetting, merging a citymodel. Things like EPSG reassignment or metadata updates wouldn't even require to loop through the features, just to alter the first object in the file.

# load_cityjson_lines.py

import json
from sys import argv

fpath = argv[1]

with open(fpath, "r") as fo:
    meta = json.loads(fo.readline())
    for feature_str in fo:
        feature = json.loads(feature_str)
        # process the feature with the information in 'meta', then discard
        pass

Execute the relevant script above for each file with /usr/bin/time -v python3 load_cityjson.py 68dn2_01.json and /usr/bin/time -v python3 load_cityjson_lines.py 68dn2_01.jsonl. The results are summarized in the table below, where the decimals are discarded, because they don't make a difference in the comparison.

The results indicate that there is a very significant benefit to having CityJSONLines-files, compared to regular CityJSON files. At least for the operations outlined above.

Tile User time [s] Max. RSS [Mb]
68dn2_01 16 2896
68dn2_01_lines 5 189
68dn2_02 18 3203
68dn2_02_lines 6 122
68dn2_03 23 3847
68dn2_03_lines 6 191
68dn2_04 22 3798
68dn2_04_lines 6 84

Proposal

  1. Promote the use of CityJSONFeatures even in file storage as an equivalent to the regular CityJSON object. I would even go as far as to promote CityJSONFeatures/Lines as the primary way for file storage.
  2. Require, or at least very strongly encourage, that when CityJSONLines are used for file storage, the very first object is a CityObject, followed by the features. If we cannot get the transform object first, before processing the features, then the whole efficient streaming logic breaks, because we need to keep everything in memory.

Disadvantages

  1. Software implementing CityJSON need to account for both regular CityJSON and CityJSONLines.
  2. In case of many, small, interconnecting CityObjects, a global vertex list would provide better compression than the separate vertex lists of the CityJSONFeatures.