GenericMappingTools / gmtserver-admin

Cache data and script for managing the GMT data server
GNU Lesser General Public License v3.0
7 stars 3 forks source link

What to do if highest resolution grid has dumb increments? #171

Open PaulWessel opened 1 year ago

PaulWessel commented 1 year ago

In testing the mars_relief recipe on the highest resolution MOLA grid (_Mars_HRSC_MOLA_BlendDEM_Global_200mpv2.tif) one learns that it is pixel registered and 106694 x 53347 with a 200 m pixel which works out to be 12.1468873601 arc seconds. Not exactly a good number to divide into one degree. Basically, there are 296.372222222 pixels per degree. Hence the trusted tiler which tries to make 10x10 degree JP2 tiles run into massive

grdconvert [WARNING]: (e - x_min) must equal (NX + eps) * x_inc), where NX is an integer and |eps| <= 0.0001.

warnings and it is just junk of course since one tile cannot match the edge of the next. I had tentatively named this grid _mars_relief_12sp.nc, knowing it is not actually 12s. Of course, anyone studying Mars might want to make the highest quality map they can of a specific region and want that grid, but we are unable to make it tiled. So, the options I see are:

  1. Let the highest resolution of a data set be the one with a grid increment that divides into 1 degree and yields a whole integer.
  2. Let the highest resolution of a data set be the next standard increment (1,3,5,10,15, ...)
  3. Let the tiler script simply skip grids that cannot be tiled and we only serve it as a single grid (as we do for 6m and coarser).

In case 1, we find the first integer below 296.372222222 that divides nicely into 3600 is 288. Thus, one would select 12.5s as the increment and filter the 2.1468873601s grid marginally to yield a 12.5 grid. We may choose to name it to the nearest integer second for compliance with our patterns (so 12s or 13s). The original highest resolution grid would not be distributed. In case 2, we know the answer is 15s so we simply produce solutions from 15s and up. The original highest resolution grid would not be distributed. In case 3 we upload the untiled _mars_relief_12sp.nc grid (like we do for low resolutions) and start tiling at 15s. This means anyone attempting to cut a chunk off @mars_relief_12s will have to wait for the entire 3.1 Gb grid to be downloaded (once), but at least the highest resolution grid is distributed.

So the casual user might be fine with cases 1 or 2, while Marsophobes will complain that we messed up the high resolution data by filtering.

I dont like to dumb down the original so I think we should pursue option 3. It is a simple test to add to the tiler to check if we have an integer number of nodes per degree, and if not we skip tiling that guy. In support of this, our unadulterated highest res netCDF grid is 3.1 Gb while the original TIF from NASA is 11 Gb, all due to our lossless compression and use of 16-bit integers with granularity of 50 cm. Comments, please.

PaulWessel commented 1 year ago

PS. I should add that one could probably make tiles that fit but they would not be 10x10 but somewhat variable to ensure we have exact common edges. However, this would mean some work in determining which tiles to get I think as well as the naming of the tiles (030W40N etc would not actually be true)_. But perhaps a bit more thought should go into how to tile something like this as clearly tiling is a superiors solution than to distribute 3.1 Gb grids.

PaulWessel commented 1 year ago

Third post: So 106694 is 2 7 7621 where the latter is a prime number. Hence one option is to make tiles that are 7621 x 7621 which is roughly 25:42:51.4285713221 degrees on the side. This would then give us 14 x 7 = 98 tiles so now we would have smaller and manageable tiles assuming JP2 is OK with an odd-numbered width. Also, the internal machinery that determines which tiles overlap a region would need to use 25:42:51.4285713221 as the step and have some scheme in translating to tile names (I am guessing we would not call them S38.5714285715E025.7142857143..mars_relief_12s_g.jp but perhaps S38E025..mars_relief_12s_g.jp

PaulWessel commented 1 year ago

Know I am talking to myself here but looks like this could work:

The longitudes of the lower left corner could be reduced to integers this way (we skip the last one at 180):

gmt math -T0/106694/7621 -o1 T 360 106694 DIV MUL FLOOR 180 SUB =
-180
-155
-129
-103
-78
-52
-26
0
25
51
77
102
128
154

while the latitude bands would be

gmt math -T0/53347/7621 -o1 T 360 106694 DIV MUL RINT 90 SUB =
-90
-64
-39
-13
13
39
64

So we would build S39E025..mars_relief_12s_g.jp and while neither 39S, 25E or 12s are correct we have meta data that knows the truth and hence can do the right thing. This way of computing those w/e/s/n values also works in the benign cases of proper increments of course.

joa-quim commented 1 year ago

Know I am talking to myself here

Sorry, it's the day 😄

PaulWessel commented 1 year ago

My 98 tiles seems to work OK - just needed some changes to the tiler script. I think with each tile in the 15-25 Mb range the use of this huge data set will be simple.

WalterHFSmith commented 1 year ago

I think the solution Paul seems to be converging to looks like a good one. The question of filenames not having coordinates that parse to the exact edge coordinates could be handled by having an optional look-up table so that tiles could be labeled tile_N for N 0 or 1 to whatever, and the look-up table would give the exact coordinates. I did not like the original suggestion that the originally supplied grid could be "filtered" just a little to generate a grid with nicer step sizes. The reason, not appreciated by most of us who don't know much more than the rudiments of the Nyquist-Shannon sampling theorem, is that even such a "minor" filtering has surprisingly bad impacts on the spectrum of the data, and to surprisingly long wavelengths (long compared to sample step size). I showed this in a paper with Karen Marks where we compared the 2-arc-minute pixel-registered grid of Sandwell & Smith with the ETOPO2 created by others who interpolated the original data onto a 2-arc-minute grid-registered grid. The losses are quite severe at quite long wavelengths. So I recommend doing something that does not require filtering or interpolating the data, if possible.

Going forward, one can urge people that make data products to consider the issues raised here, and choose a sampling step (if possible) so that (360 degrees) / (sample step X) and (180 degrees) / (sample step Y) are highly composite numbers.

PaulWessel commented 1 year ago

Thanks, I talked myself into the right solution once I realised my script could handle this with a few minor changes.

Esteban82 commented 1 year ago

BTW, @PaulWessel now that you are deeling with this.

How is it possible to reuse the machinery (to select the resolution of the remote data set) for datasets on a local pc (see forum for details).

PaulWessel commented 1 year ago

Now that @Esteban82 and I are actually playing with Pluto, I learn that there is no backwards compatible way to handle this since the gmt_data_server.txt entry for the highest resolution Pluto tiles says (and needs to be) 52.0732883317s. Being clever, I had cleverly only assigned 8 bytes for the increment string since I assumed it would be 04s and 10m etc. The overflow deletes the unit and we are left with a crude increment assumed to be in degrees.

The fix in master is trivial: Use 16 or 32 bytes for inc in the structure and all works well (we tested this). Then 6.5 will be required even if you dont care about planets since _gmt_dataserver.txt gets updated and the parsing above crashes gmt.

To have a solution that works with GMT 6.4, the options without code changes are slim. I think the only thing that is simple is to call _gmt_data_serverv3.txt and keep just the parsable ones in _gmt_dataserver.txt . Then 6.5 will read the _v3 file an 6.x x < 5 will read the original name. The two files only differs by the high-res xx.fffffffs entries. Only 6.5 can access them. Then, going forward we only update v3.txt

seisman commented 11 months ago

The fix in master is trivial: Use 16 or 32 bytes for inc in the structure and all works well (we tested this). Then 6.5 will be required even if you dont care about planets since _gmt_dataserver.txt gets updated and the parsing above crashes gmt.

To have a solution that works with GMT 6.4, the options without code changes are slim. I think the only thing that is simple is to call _gmt_data_serverv3.txt and keep just the parsable ones in _gmt_dataserver.txt . Then 6.5 will read the _v3 file an 6.x x < 5 will read the original name. The two files only differs by the high-res xx.fffffffs entries. Only 6.5 can access them. Then, going forward we only update v3.txt

Just two other solutions that may be better or worse than the current one:

  1. Keep the gmt_data_server.txt unchanged, and add other file like gmt_data_server_dumb.txt (or any other names) which contains the records of grids with dumb increments (e.g., 52.0732883317s). In this case, GMT 6.4 can still parse gmt_data_server.txt correctly, while GMT 6.5 needs to parse two files. Maybe it make the codes too complicated?
  2. Any lines that start with # are comments and will be skipped in GMT 6.4. So, if these records start with #% (or other special characters), then they will be skipped in GMT 6.4. In GMT 6.5, we can modify the codes and check if a line start with #%. If yes, then we know it's a record with dumb increment and can do more things on these special records.
seisman commented 11 months ago

BTW, I just saw the the pluto 52.0732883317s grid is named pluto_relief_52s in the candidate server. I'm wondering if we can call it like pluto_relief_full instead. Does make the codes too complicated?

PaulWessel commented 11 months ago

I'll have a look. Whether it is 52s or _full the user needs to know so it seems a bit arbitrary.

PaulWessel commented 11 months ago

I agree the backwards compatibility for the long increments in the _gmt_dataserver.txt file is to comment those lines out with a magic#%. Then only GMT 6.5 and higher can do a trick and read from position 3 on the string. Since these file snippets are made by _srvtiler.sh I have updated that script to write such records. E.g., doing pluto again it gives

#
# New Horizons Pluto Relief
#
#% /server/pluto/pluto_relief/  pluto_relief_52s_p/ 52.0732883317s  p   0.25    1000    124M    30  2023-09-11  -   -   @pluto_relief.cpt   New Horizons Pluto Relief original at 52.0732883317x52.0732883317 arc seconds [Moore et al., 2016]
/server/pluto/pluto_relief/ pluto_relief_01m_g/ 01m g   0.25    1000    115M    30  2023-09-11  -   -   @pluto_relief.cpt   New Horizons Pluto Relief at 01x01 arc minutes reduced by Gaussian Cartesian filtering (0.4 km fullwidth) [Moore et al., 2016]

etc as before

@Esteban82, if you have time, could you please redo tiling of those data sets with crazy floating point highest increment for the planets? Then just replace the old planet_dataest_server.txt files with the new ones and then rebuild the full _gmt_dataserver.txt. This is all on the CANDIDATE server only. After than gmt_server_data.txt there shoals have a handful of #% records. I will work on revising gmt_remote.c to handle these.

Note: No data grids or tiles change because of this, just the info file.

I will address the issue of _full versus_52s tags. older than 6.5 cannot understand full or course and cannot handle the 52s info string so this is just for gmt6.5. Since it is an abstraction (full does not mean the same increments for all datasets), do we also introduce earth_relief_full ?

Esteban82 commented 11 months ago

Note: No data grids or tiles change because of this, just the info file.

Ok, so I could manually edit the files, right? Because if I want to use srv_tiler.sh, then I need to have the grids on my pc.

PaulWessel commented 11 months ago

Sure, that is fine. This should be a one time operation so a bit of manual fudging should be fine.

Esteban82 commented 11 months ago

So, I think that I only have to edit mars (12.1468873601s) and pluto (52.0732883317s).

@PaulWessel I don't have permission to edit them: -rw-r--r-- 1 pwessel gmt 5371 ago 27 04:17 pluto_relief_server.txt

I think that it is not neccesary for mercury (56.25s) and Moon (14s).

PaulWessel commented 11 months ago

Darn, how many times can I forget? Entire thing under candidate should be rw for owner and group.

Esteban82 commented 11 months ago

I just add #% to the first record of mars and pluto (on gmt_data_server.txt and on its original gmt_planet_server.txt) always on Candidate.

Esteban82 commented 11 months ago

do we also introduce earth_relief_full ?

As you wish. I can live without it (but I will probably use it if it exists).

joa-quim commented 11 months ago

I think that is a mistake, full is a moving target. What is full today maybe low res tomorrow. We have the example of the coastlines resolution.

PaulWessel commented 11 months ago

That is a good argument. But, you could have the case that today, pluto_relief full is the current 52.0732883317s version, but when the Disney maps Pluto in 5 years at 16.7750884376s then IT becomes _full and the 52.07... version is just old data and we would have _full, 30s, 01m, etc etc. _full would always mean the highest resolution we have and the lower resolution always derive from the latest highest resolution data.

joa-quim commented 11 months ago

Which also means that full has really no meaning. It would have to be accompanied with a date.

Esteban82 commented 11 months ago

For me it's good because you can select the highest resolution. It would be easy to make a loop for testng the highest resolution of many datasets.

PaulWessel commented 11 months ago

Remember we are not maintaining versions of data sets. If someone needs earth_releiv v1.5 then go to Sandwell. Hence full is always the currently best data set and that is what we provide.

PaulWessel commented 11 months ago

Hi @Esteban82, I just tried

gmt grdimage @pluto_relief_06m -B -png map --GMT_DATA_SERVER=candidate

but in debug I notice that the gmt_data_server.txt does not have the #% stuff so either you did not update that one on candidate (think you said you did) or GMT isn't looking there (yet)?

joa-quim commented 11 months ago

If I go somewhere and see data_full, it tells me absolutely nothing. On the contrary, data_250m tells me everything.

joa-quim commented 11 months ago

If we ever get a OSM coastlines, how are we going to call it? full?

Esteban82 commented 11 months ago

Hi @Esteban82, I just tried

gmt grdimage @pluto_relief_06m -B -png map --GMT_DATA_SERVER=candidate

but in debug I notice that the gmt_data_server.txt does not have the #% stuff so either you did not update that one on candidate (think you said you did) or GMT isn't looking there (yet)?

I see here ( http://candidate.generic-mapping-tools.org/gmt_data_server.txt ) that it is updated. Altough I can't see #% on my local file. So I think that GMT isn't looking there.

PaulWessel commented 11 months ago

I will debug to learn what happens. But got up early today so maybe tomorrow...

Esteban82 commented 11 months ago

Is it not the problem with the CPT? We didn't make one yet.

grdimage [ERROR]: Probably means @mercury_relief.cpt does not exist on the remote server

PaulWessel commented 11 months ago

Yes, I had to put the cpt in my cache. So the plot worked fine but when I debugged to see the reading of gmt_data_server.txt I noticed that file did not have any #% for the 52.xxxx s file:

/server/pluto/pluto_relief/ pluto_relief_52s_p/ 52.0732883317s p 0.25 1000 124M 30 2023-07-25 - - @pluto_relief.cpt New Horizons Pluto Relief original at 52.0732883317x52.0732883317 arc seconds [Moore et al., 2016]

Esteban82 commented 11 months ago

I just add #% to the first record of mars and pluto (on gmt_data_server.txt and on its original gmt_planet_server.txt) always on Candidate.

Is it possible that GMT can't interpret #% well? I got the message below that dissapear when I deleted the #% from my local version of gmt_data_server.txt

gmt [WARNING]: File /home/federico/.gmt/server/gmt_data_server.txt said it has 418 records but only found 416 - download error???
gmt [WARNING]: File /home/federico/.gmt/server/gmt_data_server.txt should be deleted.  Please try again
seisman commented 11 months ago

I just add #% to the first record of mars and pluto (on gmt_data_server.txt and on its original gmt_planet_server.txt) always on Candidate.

Is it possible that GMT can't interpret #% well? I got the message below that dissapear when I deleted the #% from my local version of gmt_data_server.txt

gmt [WARNING]: File /home/federico/.gmt/server/gmt_data_server.txt said it has 418 records but only found 416 - download error???
gmt [WARNING]: File /home/federico/.gmt/server/gmt_data_server.txt should be deleted.  Please try again

I can reproduce the issue. The problem is, the number at the top of the gmt_data_server.txt file doesn't match the number of records in the file, because record starts with # or #% are comments in GMT 6.4, although it's possible for GMT 6.5 to correctly recognize it. It also means, the current way (using #%) can not work for both GMT 6.4 and GMT 6.5.

I tried to use the candidate server and also removed the leading #% from the records. For GMT 6.4, it seems that all datasets workwell except pluto_relief_52s and mars_relief_12s. So I think we don't need to make any changes.

We can still use the old format in the gmt_data_server.txt file (without the special #% records). The new gmt_data_server.txt file works well for both GMT 6.5 and GMT 6.4. The only issue is that GMT 6.4 users can't access pluto_relief_52s and mars_relief_12s, which I believe are just very minor issues and few users will care.