citygml4j / citygml-tools

Collection of tools for processing CityGML files
Apache License 2.0
124 stars 19 forks source link

Java heap space OOM error and multifile processing #14

Closed mwussow closed 4 years ago

mwussow commented 4 years ago

Dear Claus,

thanks for this great tool! There are two errors that I frequently encounter when using citygml-tools:

1) Running out of java heap space memory: I have 8GB RAM on my machine and yet I often run OOM when using citygml-tools. I already increased the default heap space with export _JAVA_OPTIONS="-Xmx6g" and this helped to make this error less common, but I still encounter it when processing large files (i.e., >3GB) or multiple files (i.e., 100+ files).

2) File sizes blow up when processing multiple files at once I am trying to convert ~3k GML files (each ~100MB) to CityJSON. While it seems to be possible to convert several files at once by providing the path to the folder where they are saved, tghe file sizes of subsequent fiels increase linearly, which leads to dramatically oversized json files when processing 100+ files at once.

I would highly appreciate any advice on how to process multiple files efficiently and any hints on how to fix the above issues.

Thanks, Moritz

clausnagel commented 4 years ago

Thanks for your feedback.

  1. CityGML files currently must be loaded into main memory to be able to convert them to CityJSON. A chunk-wise processing would help to keep the memory footprint low. For example, reading only one cityObjectMember at a time and directly writing it to the CityJSON target file. However, CityJSON currently does not well support chunk-wise processing (see https://github.com/cityjson/specs/issues/6). But the editors are aware of this issue, and there are first proposals for solving it.

    In the meantime, to avoid OOM errors, you should only run citygml-tools on small enough CityGML files that can be loaded into main memory.

  2. This sounds like a bug. It should, of course, be possible to convert a folder of CityGML files in one run if each file can be loaded into main memory. Seems like there is a memory leak in the code...

I will look into 2 and report back soon. If you are able to share your datasets, I'm happy to use them in my tests.

mwussow commented 4 years ago

Thanks for your prompt reply! A workaround that seems to work for me is to run a python script that executes citygml-tools for each file individually:

import os
from tqdm import tqdm
files = os.listdir()
path = [path_to_folder]

t = tqdm(files)
for f in t:
    if f[-4:] == '.gml':
        command = '[path_to_citygml-tools]/citygml-tools-1.3.2/bin/citygml-tools to-cityjson ' + path + f
        os.system(command)
print('done')
clausnagel commented 4 years ago

Ok, I fixed the following issues:

Both fixes are available in the master branch (aae72a2e651d06a724aa6f9e8257f1031b01804b). Could you please build a new version of citygml-tools from master and test whether the fixes solve your issues? Let me know if you need help with building citygml-tools from source.