GenericMappingTools / gmtserver-admin

Cache data and script for managing the GMT data server
GNU Lesser General Public License v3.0
7 stars 3 forks source link

How should maintainers handle the various oceania ghost servers? #227

Closed PaulWessel closed 1 month ago

PaulWessel commented 10 months ago

The official GMT data server is oceania.generic-mapping-tools.org where GMT users acquire their remote grids, unless of course they use the default GMT_DATA_SERVER to select one in China, North or South America, Europe, etc., perhaps a bit closer to where they work (FYI: As I am now in Oslo I am more eager to ensure the global mirrors are up to date, e.g, europe [@joa-quim].

However, the Hawaii server data maintainers (most recently @Esteban82, me, @maxrjones and @willschlitzer, others? @GenericMappingTools/core]) need to place data on the server when new or updated data comes along. For this we are assisted by a few ghost servers (that are not fully configured yet):

candidate.generic-mapping-tools.org: This is used to fine-tune the next candidate release of the full set of data. If a new Smith/Sandwell earth_relief is released it will be processed and placed on the candidate server. Same with new data sets (e.g., Mars) which also go there. Our processing tools and Makefile take care of doing the sync and building an updated _gmt_dataserver.txt under candidate.

static.generic-mapping-tools.org: This is intended (but not populated yet) to be a never-updating set of datasets that are used in GMT tests and documentation figures. The problem this is meant to address is this: We store original PostScript files in DVC and then we run tests and compare the produced PostScript files to the originals. If the RMS difference between two (rasterised to PNG etc by gm) is tiny (we have 0.003 as an empirical limit) then the test passes. However, if a new _earthrelief is updated on oceania then the new plot will be much different and the test will "fail". Of course, a better plot is not a failure but it trips up our local and CI checks for how many scripts fail. The static server solution would be to run the tests with gmt set GMT_DATA_SERVER=static so that data do not change. Of course, if new test script number 1200 requires Mars topography then that gets added to static but when NASA releases an even better Mars relief we do not place that on static, only on candidate [which during a release will be rsync'ed to oceania].

test.generic-mapping-tools.org: We have had that one for a few years but haven't really used it. Much of the setup of the server/planet directories and maintaining _gmt_serverdata.txt was laborious, manual work, but since the summit we have worked hard to make these tasks something any maintainer with permissions can do via high-level Makefiles and sometimes running a script directly. Given we now have candidate and static we can use test for whatever experimental things related to remote files. For instance, until we have polished the new Planet X gravity we might play with intermediate files on test.

So that is the background. My questions pertains to this: GMT maintainers are also GMT users and end up getting copies of remote files placed on their local machine, usually under ~/.gmt/server. This is all well and good, for GMT users. However, if I want to test out the Mercury DEM and I do

gmt grdimage @mercury_relief_06m -B -png map --GMT_DATA_SERVER=candidate

then the Mercury files (and tiles if higher res) gets added to ~/.gmt/server. But I may not want that. Of course I can delete those subdirectories when done but if I accidentally tried the new Earth relief data then that overwrites my old ones. Again, might be OK, might not be for maintainers.

Three solutions (one of which I've been doing):

  1. The manual one, make sure you dont forget:

mv ~/.gmt/server ~/.gmt/orig_server

Now run your grdimage command and new files are placed in ~/.gmt/server. When you want to go back to how it was you reverse the steps (rm -rf ~/.gmt/server; mv ~/.gmt/orig_server ~/.gmt/server)

  1. The Makefile way:

Basically add two make targets: server-off and server-on. They basically do the above without your occasional typos.

  1. The auto way:

When using any of these 3 extra servers (via GMT_DATA_SERVER=candidate, for example) we instead write the remote files and info files to ~/.gmt/candidate). Since GMT knows we have set candidate it uses that top-level subdir instead of just "server" - which is only used for the public oceania data (or via the other mirrors around the word).

I cannot live with 1. Option 2 is not bad - it can even prevent you from running make server-on twice in a row, etc., but given make targets don't take arguments, stuff like make server candidate is not possible (but other ways would be more targets). Option 3 is most maintainer friendly in that files you might wish to have are not overwritten, forcing you to scp more files from a remote server.

However, I am interested in your opinion, especially those who have a login to gmtserver.

seisman commented 10 months ago

Option 3 sounds good to me.

joa-quim commented 10 months ago

I think option 2. won't work on a non-educated Windows. 3. is the best.

PaulWessel commented 10 months ago

Great, already half-way through option 3 coding.

Esteban82 commented 10 months ago

I agree with option 3.

seisman commented 10 months ago

When using any of these 3 extra servers (via GMT_DATA_SERVER=candidate, for example) we instead write the remote files and info files to ~/.gmt/candidate).

Maybe a name like ~/.gmt/server-candidate is better thant ~/.gmt/candidate?

PaulWessel commented 10 months ago

Check out this WIP GMT branch. Works fine for me. However, I do have a question for you:

Unlike the remote data sets, the cache directory is actually in GitHub and oceania's cron runs a check every hour to see if the repo gmtserver-admin has an update. So adding a new cache file (e.g., _planetrelief.cpt) is simple; just add the file to your local gmtserver-admin repo's cache dir and submit a PR, and once accepted (self-accept is OK unless it is a beast of a file) it will be available as @planet_relief.cpt within the hour (from oceania - might take a day for the mirrors depending on how often they rsync).

The question is, how to deal with cache dirs on the ghost servers (candidate, static, test)? Do I just update the crontab script to rsync to those three directories (in addition to oceania)? Then they are all identical and will all update when we add new files. I think this is the easiest thing to do (both the server and cache directories are placed under the candidate, static, and test dirs). Cache is about 100 Mb so not very big and with little activity. The alternative would be to maintain three different snapshots of cache - see not worth it.

seisman commented 10 months ago

Do I just update the crontab script to rsync to those three directories (in addition to oceania)? Then they are all identical and will all update when we add new files.

Sounds good to me.

seisman commented 1 month ago

I think this issue can also be closed.