Closed PaulWessel closed 4 years ago
One issue up front: If we make any structural changes to the gmtserver directories, we break access for everybody else. I think curl will follow symbolic links? If so then we could add links such as earth_relief_01m.grd that points to, say, earth_relief/earth_relief_01mg.grd (Notice the g for "Gridline-registered"). I guess we can do an experiment on that.
Experiment worked. A symbolic link in the right place can point to another file and be followed. So that is how we could introduce earth_relief/earth_relief_01mp|g.grd files with links to the gridline version from the directory above.
To get ready for 6.1.0 release, we should do this:
Please give feedback on this now, @joa-quim and @seisman. I am trying to avoid changes down the road, which will see (hopefully)
The @ algorithm will need to learn what is available via gmt_hash_server.txt. BTW, my hash_server file is 0 byes on May 20, so something failed - what does yours say.
I should add one more thing: The SRTM15+v2.1 is a floating point grid with more precision than the integers. This time I wrong out this format =ns+sa which auto-scales the data to fit the full -32767/+32767 range, and hence the precision in the values is about 0.25-0.3 meters instead of 1 meter. This makes the files a bit larger since more different bits. As an example, the full 15s file grows from 2.6 Gb to 3.1 Gb, and the 1 minute goes from 215 Mb to 257 Mb, both about 20% increase. Do you wnat to dumb down to nearest meter and retain the smaller file sizes? I.,e trading 3-4 times the precision for 1.2 times longer download time. I would certainly prefer the higher quality.
How do we handle this scenario:
Symbolic links. So user will say @earth_relief_xxy.grd
and the link will give him @earth_relief_xxyp.grd
, right? But what will happen next time he issues the same command? Will it not download the same file again because earth_relief_xxy.grd
doesn't exist in his system?
3b because those are known names whilst earth_day|night_
are unknowns
For the same reason as 3 the files should have their known names *age.xxx.nc``
Ok, I see that you are addressing my first point too
Are you saying drop earth_ from the ages files? FYI, there are actually two files
age.2020.1.GTS2012.1m.nc age.2020.1.GK2007.1m.nc
for two different time-scales. The people who care about these grids are the same peolple who care about the different time-scales... So we may have to do
age_GTS2012_xxy_p|g and age_GK2007_xxy_p|g but we could accept age_xxy_g|p ti sekect GK2007 (I think they prefer that and that is what they only used in the past).
I would certainly prefer the higher quality.
Me too
Yes, drop the earth_ from name. And if they are both 1m only we don't need the _xxy
You are forgetting 30m, 15m, all the way down to 2m. So xxy is there to stay.
I wonder if this is a better solution:
If we don't, then we will need to carry much complexity in the gmt_remote.c file to handle these cases:
This way the old system will work fine - they wont get the new files until they upgrade (a good argument to do so), and we don't have to deal with legacy file names and complicated checks for old and new file names.
Also remember: We will need to maintain two separate hash tables, one for pre 6.1 and one new ones. GMT 5-6.0 will download the old one with the old files, 6.1 will download the new one with the new files and directory structure.
Yes, drop the earth_ from name.
I will ask EarthByte what they prefer - it is their files.
6.1 will download the new one with the new files and directory structure.
But can they still call them @earth_relief_xxy.grd
and get the old names, right?
I guess regardless of scheme on the server, we still need to allow for an alias that matches earth_relief_xxy to earth_relief_xxy_g. Seems like we need these features:
.
Will need to test all this. Having a separate server dir means we can test the stuff in a new branch without breaking anything yet.
- Leave the current data directory on the server as is. Ubuntu users will still try to access taht in 2023.
- Create a new dir with another name than data, e.g., server (to match what we create in the user's directory)
- Place all the new subdirs and data we discussed in the new server directory.
I like the idea. In this case, the local file structure is the same as the remote one. Users even can use rsync
to manually mirror the dataset.
We will need to maintain two separate hash tables, one for pre 6.1 and one new ones. GMT 5-6.0 will download the old one with the old files, 6.1 will download the new one with the new files and directory structure.
Can we list all files (both the 6.0 and 6.1 data files) in the same gmt_hash_server.txt file. The file would have content like:
173
# list of old files
earth_relief_01d.grd 08871f1e1aa7feb0bb43a259130f74fcea1c54bfe4f6b9988b781b1e362198d4 108278
earth_relief_01m.grd aa11e643221faef792639c5800fd9ccaa59c7c4e8cac73a17170edb3f4c19086 225267444
AFR.nc ee581d480ab40b8c196dc1c5a951a05cc577c9b735865036b28ce223d827513f 129281
age.3.20.nc 8c6094015cedfc81bb4cf82e780ffcf709211c13f7b40fefe46a921611ca25af 442404
age_gridline.nc c9cc0f9424eb176cfde037aaf77f98c2713c22bf3afc0a225db04cd11a172b0a 1171167
# list of new files
server/earth_relief/earth_relief_01d_g.grd 08871f1e1aa7feb0bb43a259130f74fcea1c54bfe4f6b9988b781b1e362198d4 108278
server/earth_relief/earth_relief_01d_p.grd aa11e643221faef792639c5800fd9ccaa59c7c4e8cac73a17170edb3f4c19086 225267444
cache/AFR.nc ee581d480ab40b8c196dc1c5a951a05cc577c9b735865036b28ce223d827513f 129281
cache/age.3.20.nc 8c6094015cedfc81bb4cf82e780ffcf709211c13f7b40fefe46a921611ca25af 442404
cache/age_gridline.nc c9cc0f9424eb176cfde037aaf77f98c2713c22bf3afc0a225db04cd11a172b0a 1171167
Does it make gmt_remote.c and backward compatibility easier?
Will have to try and see. There are special pattern checks in gmt_remote in 6.0.0 that will prevent you from downloading the _g|p.grd files for sure.
Currently, we include gmt_datasets.h to rule out invalid remote file names. However, this does not allow us to add more data without requiring a GMT source code update. The options are
Seems to me the gmt_hashtable is the most useful approach since we know it will be up-to-date (and thus change once we deliver age*). I don't think we even need to change its format (so still work for <=6.0) since the subdirs have the same prefix as the filenames (e.g., earth_relief_04m_p.grd will be in directory earth_relief).
If we do as @seisman says and put both current and new stuff in the same hash table then old behavior should continue just fine, while new usage requiring 6.1 will work its way. The only exceptions we need to handle are:
So I think the only decision is the last one: Do we change the name to the new names when an old name is requested or do we allow users to continue with those names?
A final point: Since the distinction between g and p is lost on the uninformed users, quietly allowing earth_relief_xy is not so bad as it is simpler and already in practice. It begs the question if we should add links age_xxy.grd to point to age/age_GK2007_xxy_g.grd.
OK, one more for @joa-quim : One argument for calling the files earth_age is that when the work we proposed for NASA (whether funded or not) happens, users wishing to make a map can choose not to give _xxy at all. I think it is just to unspecific to say
gmt grdimage @age -B -map pdf
and would strongly prefer
gmt grdimage @earth_age -B map pdf
I know there is no mars_age but we will do earth_gravity, moon_gravity, etc. This is why I suggested and still suggest we dont use BlueMarble etc names that are not very specific unless you know what that means. earth_day|night is much more generic and when new data comes out taht is not called BlueMarble then we dont have to agonize over a bad naming choice.
I have tested the symbolic links on the 01d grid: Removed the old grid, added symbolic link by same name pointing to server/earth_relief/earth_relief_01d_g.grd. THen on local machine removed the downloaded file and tried gmt grdinfo @earth_relief_01d.grd. Worked like a charm and is written with the old name. So I think I can remove all the old files and set those symbolic links. Any objections, @joa-quim or @seisman ? I know it is Friday so you may not be glued to your monitor but I am, so doing this within the hour.
Currently, we only have earth_relief_xxy files in the gmt/data directory (everything else is under gmt/data/cache). However, we are about to add both blue and black marbles, the global crustal ages, and it is likely there will be more data sets in the future that should not be considered for cache (since they will have multiple resolutions etc). To peak ahead, it is likely we will split large global items into tiles, similar to SRTM. Whether we do that or not right now, it seems we should think about organization. How about this:
Inside these directories are the actual files: earth_relief_xym plus srtm1, strtm3 will be in the earth_relief folder, etc.
Perhaps the gmtserver needs to produce or maintain a listing of what is in server so that gmt can discover that we have added more data. We would at least need to know if a dataset is tiled or not to know what to do. I think the decisions that happen in gmt_remote.c depending on earth_relief resolution (get file or get tiles) need to be abstracted away and be based on a setup file we refresh, just like we refresh the hashes.