Closed MortenHofft closed 2 years ago
Looking at the ES documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html
I can see that geo centroids can fall outside their buckets, but only if the aggregation is using multiple points to aggregate the initial grid. So I would guess that does NOT apply to our use.
I do not have this issue when using the tile server from the gbif-web project btw (using the same approach of translating coordinates to geohash etc), so that makes me think that it is likely that it is an issue with the Java tile server.
@MortenHofft can you please tell where is and how is the bounding box calculated, I double checked all your geo hash calculations using ES libraries and everything looks correct
In gbif-web/es2vt
a request for 0/0/0
fires this ElasticSearch request:
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"bool": {
"filter": [
{
"term": {
"eventId.keyword": "MGYA00167306"
}
},
{
"term": {
"hasCoordinate": true
}
}
]
}
},
{
"geo_bounding_box": {
"coordinates": {
"top": 87.1875,
"left": -180,
"bottom": -87.1875,
"right": 180
}
}
}
]
}
},
"aggs": {
"geo": {
"geohash_grid": {
"field": "coordinates",
"precision": 3,
"size": 40000
},
"aggs": {
"geo": {
"geo_centroid": {
"field": "coordinates"
}
}
}
}
}
}
and the response is
{
"took": 94,
"timed_out": false,
"_shards": {
"total": 371,
"successful": 371,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1056,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"geo": {
"buckets": [
{
"key": "sp3",
"doc_count": 1056,
"geo": {
"location": {
"lat": 41.66859996970743,
"lon": 2.799599999561906
},
"count": 1056
}
}
]
}
}
}
This is translated to the tile coordinates { x: 2079.853226661682, y: 1525.5656770276148 }
on a tile with extend 4096
.
When inspecting that feature through MapBox I see:
lat: 41.705728515237524
lon: 2.724609375
The change in coordinates seems reasonable given that it was translated to tile coordinates and then back to latitude+longitude.
In some cases I guess that change could mean that the coordinates shifted geohash. Perhaps that is the reason?
The tile sent from v2/map/occurrence/adhoc/0/0/0
has an extent of 512
. Perhaps the lower resolution is the reason? It leads to larger rounding, which in this case means that once it has been reprojected back into latLng it no longer falls into the same geohash?
That theory of 512 vs 4096 doesn't explain it (at least not fully). I just tried using an extent of 512 without any issues (even 256 works)
Doing so the tile coordinates are { x: 259.98165333271027, y: 190.69570962845185 }
and that is reprojected by mapbox as
lat 42.0329743324414
lon 2.109375
So clearly a larger imprecision, but in this case still the same geohash. It still seems like a likely cause (the Java client might project ono the tile differently). And either way as a source of error in general for this approach (of deducing the geohash from the client). The gbif-web project is probably just rounding in the correct direction when translating to tile coordinates (pure luck I must admit)
This is how the gbif-web project calculates the tile coordinates. Unlike the Java project, this one only supports Mercator.
// based on https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames#ECMAScript_.28JavaScript.2FActionScript.2C_etc..29
// but the floor rounding removed. I did so with the assumption, that the rounded part corresponded to the tile coordinates
// from wiki entry, but modified to remove rounding
function lon2tile(lon,zoom) {
return (lon+180)/360*Math.pow(2,zoom);
}
// from wiki entry, but modified to remove rounding
function lat2tile(lat,zoom) {
return (1-Math.log(Math.tan(lat*Math.PI/180) + 1/Math.cos(lat*Math.PI/180))/Math.PI)/2 *Math.pow(2,zoom);
}
// own attempt to translate latLng to tile coordinates
function getTileCoordinates(lat, lon, zoom, extend) {
let ty = lon2tile(lon, zoom), // 0.5077766666654497
tx = lat2tile(lat, zoom); // 0.37245255786807
let result = {
x: extend * (tx % 1),
y: extend * (ty % 1)
};
return result; // {x: 190.69570962845185, y: 259.98165333271027}
}
// coordinates from the ES query above
getTileCoordinates(41.66859996970743, 2.799599999561906, 0, 512);
when I use MapBox to inspect the point I get
y: 210 coordinates: [2.8125, 41.50857729743936] // how mapbox translate the tile coordinates presumably
The resulting longitude of 2.8125 looks correct to me.
MVTs can only store integer coordinates, so the choices for longitude are: 2.10938, 2.8125, 3.51562 at this 512px resolution.
es2vt tile:
"key": "sp3", "doc_count": 1056, "geo": { "location": { "lat": 41.66859996970743, "lon": 2.799599999561906 }, "count": 1056 }
This is translated to the tile coordinates
{ x: 2079.853226661682, y: 1525.5656770276148 }
on a tile with extent 4096
How do you get non-integer tile coordinates?
From es2vt I get a latitude of 2.72461°E, with the 4096 extent.
The choices are 2079/4096.0 * 360 - 180 = 2.72461°
or 2080/4096.0 * 360 - 180 = 2.8125°
, so there is a rounding error here. x: 2079.8532
should lead to 2080 (ignoring the geohash, this is just to have the correct position of the occurrence marker).
I do not think it is incorrect at such, simply that the rounding (instead of flooring as the javascript project does) lead to the point falling into a new geohash than it belonged to before. That is just in this specific case of course, I'm not saying that flooring is the solution. What would be nice, would be if we could have a geohash stability guarantee on our rounding (for the given precision)
The Java version keeps double
precision up until the point it assigns it to an XY address on the tile using Math.round()
.
I am not sure that is correct is it? A point that is at 0.99, 0.99 on the tile should be rendered in the 0,0 cell on the tile, and not the 1,1 cell. I think this should be using Math.floor()
@MattBlissett agree?
Actually yes, you're both correct, it should floor.
I will fix and deploy to dev
It looks like this will help a lot, but it isn't clear to me that we won't have the issue still (less often probably).
For example geohash sp3vp9
has the bounding box
// sw: lat: 41.6656494140625, lon: 2.79052734375
// ne: lat: 41.671142578125, lon: 2.801513671875
So if we take the point lat: 41.6656494140625 lon: 2.799599999561906
then it falls just within sp3vp9
(precision 6).
If we project that onto a tile with extend 512
for zoom level 6
Then I believe we get
x: 254.82581329345703
y: 428.8849329930272
if we floor that and reproject to latitude longitude then I get (I might be doing this wrong, but the result seems plausible I think).
lon: 2.79052734375
lat: 41.67291181960209 // the latitude moves slightly upwards as expected
Which corresponds to geohash sp3vpd
(the northern neighbour) - in other words a "wrong" geohash.
UPDATE: In this specific example a resolution of 4096 would have masked the issue.
Is it necessary to use a geohash?
A bounding box calculated from the extent of the pixel in the tile might work better.
I thought the geohash resolutions were calculated to fit the 512px extent at each zoom level, but I don't think it was done expecting the calculations to be reversed.
We could try to solve it differently. But I do not think extend from pixel will work as far as I can see. The circle/point in the tile has a count that corresponds to the occurrences within the geohash. So the geohash is the bounding box needed. At least I'm not able to see other solutions.
There might not be any occurrences at the pixel point where the circle shows. It is a centroid of the occurrences within the geohash
I've added a property geohash
to the ad hoc tiles when requesting GEO_CENTROID
mode. Deployed on dev.
Including the geohash will solve it of course. That is what we have done all along in the Javascript tile server.
Around start December 2021 I presented 2 approaches:
I'm no longer sure what I prefer. I used to think that sending along the geohash was preferable. It is simple and the tile sizes are still small. But I get the point, that when exploring, you end up looking at many tiles. And so the difference becomes more like 2mb vs 1mb.
Notice also that when test browsing with geohash included I downloaded a total of 500kb
from our tiles and 2.6mb
from the Mapbox tiles. So our tiles contribute fairly little to the total amount of data downloaded.
Those tile sizes will not be the same for the Java implementation as the Javascript version doesn't have buffers and do not send exact counts. Instead it rounds to the thresholds used for map styling (e.g. 1, 10, 100, 1000, 10.000, etc.). This was done to keep the tile size down and since the counts weren't being used for anything but styling.
For future reference, I made a few experiment on tile sizes. Geohash precision 3. Zoom level 2. Centered on Europe where there is a lot of data. The tests are made using the Javascript server, which doesn't include buffers unlike the Java implementation. The order of the points are as provided by ES.
Tile size | Properties | KB |
---|---|---|
256 | rounded total | 11.2 + 11.0 + 3.4 + 3.2 |
512 | rounded total | 11.9 + 11.8 + 3.5 + 3.3 |
4096 | rounded total | 14.3 + 14.3 0 4.1 + 3.8 |
4096 | rounded total + geohash | 28.3 + 28.3 + 8.0 + 7.4 |
4096 | rounded total + precision (once per feature) | 14.4 + 14.4 + 4.2 + 3.9 |
4096 | rounded total + precision (once per feature) + feature array shuffled | 15.5 + 15.4 + 4.5 + 4.1 |
4096 | exact total + geohash | 41.8 + 41.8 + 10.8 + 9.4 |
"rounded total" means that the number is rounded to a factor of 10 as that is the thresholds using in the current style. As this reduce the distinct counts, it is much more effective than sending the exact count.
Above shows that: Tile size (extent) matters fairly little for the kb size. Shuffling the array adds size as the tile encoder has to move longer. If we want to reduce tile size, the best we can do is to
Doing so would reduce the largest of the tested tiles from 41.8kb
to 14.4kb
.
An example of where calculating the geohash from the precision and location becomes an issue. Occurrence in dev Aspicilia canina Räsänen with occurrenceId http://id.snsb.info/snsb/collection/452648/552880/232131
const extend = 512; // tile size extend
const precision = 3; // geohash precision
const zoom = 2;
const lat = 63.252872;
const lon = 34.123535;
When above coordinates is translated into localTileXY and back into latLng and then to the geohash, then the geohash has moved north.
Proposed fix To fix this we could project back into latlon, test that the geohash is the same. If not then offset the localTileX or localTileY with 1 (in the correct direction which depends on what direction the geohash moved).
I will close this issue as I think the simple solution where we add the geohash is reasonable. If we discover that we have performance issues, then we could revisit this decision.
Thanks Morten.
If performance issues do arise, adding a mode=rounded
seems like a sensible thing to explore.
Or flooring=1,10,100,1000,10000
or similar, since the point of including the exact numbers was for API users to style the tiles as they wish.
I'm not sure where the issue is, it might very well be outside this project. But I could use some help in finding it.
What I'm trying to do:
Implementation
Issue The issue is that when I take the coordinates (that MapBox claims is those of the feature) and converts them to a bounding box as described above, then there is no data. The actual occurrences in the index is in another geohash. So somewhere along the line the point is shifted to a wrong geohash.
Example PNG https://api.gbif-dev.org/v2/map/occurrence/adhoc/0/0/0@2x.png?style=scaled.circles&mode=GEO_CENTROID&eventId=MGYA00167306&srs=EPSG%3A3857 Just to show that it isn't a tile edge and the complexities around that.
MVT https://api.gbif-dev.org/v2/map/occurrence/adhoc/0/0/0.mvt?style=scaled.circles&mode=GEO_CENTROID&eventId=MGYA00167306&srs=EPSG%3A3857
when I use MapBox to inspect the point I get
Using zoom level 0 which according to the lookup has a precision of 3. Then the geohash is
sp6
(confirmed using https://www.movable-type.co.uk/scripts/geohash.html)But if I look at one of the occurrences coordinates
lat: 41.6686, lon: 2.7996
then I getgeohash: sp3