OSGeo / PROJ

PROJ - Cartographic Projections and Coordinate Transformations Library
https://proj.org
Other
1.67k stars 761 forks source link

Large JSON TIN support #3732

Closed GlenRice-NOAA closed 1 year ago

GlenRice-NOAA commented 1 year ago

We are running into trouble extending the PROJ structure where we have large JSON TIN files like we were doing in ticket #3616. The limit seems to be set here, but this limit is labeled as arbitrary.

Our current use case has an 80 MB file for the Alaska region. In the short term we would like to explore increasing this limit (perhaps as part of some local configuration?), but perhaps in the long run there could be support for some Cloud Optimized TIN format for the CDN?

rouault commented 1 year ago

I could raise the limit, but did you check by patching locally that increasing it or removing it results in acceptable enough performance ? I guess it could take several seconds to load such JSON file

but perhaps in the long run there could be support for some Cloud Optimized TIN format for the CDN?

Potentially. R&D required to find or design an appropriate format and plug that into PROJ

GlenRice-NOAA commented 1 year ago

did you check by patching locally that increasing it or removing it results in acceptable enough performance ?

No, we have not patch locally due to a lack of local utilities. Because of the file size we are pretty sure this is the issue, but we also wanted to see if there are other concerns or thoughts around this limit.

rouault commented 1 year ago

@GlenRice-NOAA Can you share this 80 MB JSON so I can have a try ?

GlenRice-NOAA commented 1 year ago

Looks like we were able to reduce to ~60 MB each. Here are the examples. AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.zip AK_EEZ_ERTDM23_EPSG6318_MHW-LMSL.zip AK_EEZ_ERTDM23_EPSG6318_LMSL-xGeoid20B.zip

rouault commented 1 year ago

Limit raised to 100 MB in https://github.com/OSGeo/PROJ/pull/3736 You could still reduce the file size as currently your coordinates are with 13 decimal digits, ie a 1e-8 metre precision...

Side node: I'm not sure if there's something wrong in your triangulation or in PROJ implementation but it seems I only manage to transform coordinates at exact vertices triangles, not their inside: cf

$ echo -144.0550994873047 70.07471466064453 0 |  bin/cct +proj=tinshift +file=./AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json
    -144.0551        70.0747       -0.0964           inf
$ echo -144.0550994873  70.07471466064  0 |  bin/cct +proj=tinshift +file=./AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json
# Record 0 TRANSFORMATION ERROR: -144.0550994873 70.07471466064 0
 ((null))
Rjialceky commented 1 year ago

Thanks for the side note. Your second test point is outside the triangulated domain...

image

...and we didn't have a _fallbackstrategy parameter specified—that's possible in _formatversion >= 1.1. I added to the AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json the item "fallback_strategy": "nearest_side", and bumped the version item accordingly: "format_version": "1.1". I tested this on a <10 MB version of the JSON tin with success.

geoglrb commented 1 year ago

Hello! I am watching this issue/fix with interest, as I think support for larger files--and ideally perhaps someday support for cloud optimized TINs, or even better, for something like a multi-scaled tile pyramid for various levels of approximation--has applicability to work that I and some others want to do with TINshifting as well. Thanks!

jjimenezshaw commented 1 year ago

Hi. I was looking at the code, and I saw a possible optimization in performance: "remember" the last triangle used, and check it before doing any search. If the user is usually moving on a small area, it is probably changing to a different triangle not so often. That reduces search time an cache pollution. But I do not know if it is worth it. (do you have any gut feeling?) I should do some tests to measure how fast/slow is really the search. But if you consider that it is currently fast enough with a big mesh @GlenRice-NOAA , I will not even try it.

In addition to that, I was surprised about the memory consumption that @rouault mentioned with the 60 MB file: more than 1 GB. I understand that it is due to the quadtree. I know a quick and lean algorithm, but needs the total mesh to be convex (it can be filled with "invalid" triangles to become convex, but must be complete).

rouault commented 1 year ago

"remember" the last triangle used, and check it before doing any search.

Hard to tell exactly how efficient it would be. Really depends on use cases, but that's quite cheap to implement.

An alternative to the quadtree is that if you have a triangulation where for each edge you know the identifier of the 2 triangles that share it, you can "walk" into the triangulation. It might perhaps require the triangulation to be a Delaunay one, but I'm not sure (GDAL's linear interpolation method uses libqhull to create a Delaunay triangulation and uses that technique to find in which triangle a points falls into, given an initial guess. That works well for well behaved triangulations, but as soon as you've a degenerate or close to be degenerate triangle, numeric precisions issues occur and you've to revert to something else, like a brute force search) Given the input we currently have, the quadtree approach has the benefit of being simple and reasonably performant.

was surprised about the memory consumption that @rouault mentioned with the 60 MB file: more than 1 GB. I understand that it is due to the quadtree.

I haven't profiled but my bet would rather be on the JSON in-memory representation rather than the quadree. At least with libjson-c, I know that the RAM consumption of a GeoJSON file can typically be 10 to 20 times larger than its file size. I don't know about the nlohmann:json library we use. If each JSON object, ie each coordinate value, needs a dynamic allocation, that creates tons of smalls objects, and pointers to them.

jjimenezshaw commented 1 year ago

That "walking" algorithm is the one I had in mind. The triangulation does not have to be Delaunay, just convex. I guess that it may not work very well with degenerated triangles.

I will test the performance of the current algorithm and the idea of "remember" the last triangle used.