GenericMappingTools / gmtserver-admin

Cache data and script for managing the GMT data server
GNU Lesser General Public License v3.0
7 stars 3 forks source link

How to speed up srv_donwsampler_grd.sh? #183

Closed Esteban82 closed 1 year ago

Esteban82 commented 1 year ago

Intro

I am processing the SRTM v2.5 (6.2 GB) to update the @earth_relief. My PC has 16 GB RAM so it took like 12 hs. While it was processing I was curious why it took too much time, specially for the lower resolutions grids (e.g. 30m, 01d).

Note: I did it with gmt 6.4 (so without this feature 7192).

Question

So, I wonder if there is any way to speed up this process?

What I tried

The only thing I try so far is to run srv_downsampler_grd.sh using @earth_relief_15s_p (which is 2.9 GB) instead of the original SRTM_2.5.nc (6.2 GB). The difference is that the lower resolutions grids are obtained from the processed grid instead of the original.

I think that @earth_relief_15s_p was created with: gmt grdconvert SRTM15_V2.5.nc -Gearth/earth_relief/earth_relief_15s_p.grd=ns+s0.5+o0 --IO_NC4_DEFLATION_LEVEL=9

For this, I used a much powerful PC so I don't know exactly how much faster was the done.

Here I made some maps to compare the datasets with different resolutions. The above map is the difference between the other two grids.

gmt grdmath "original_grid" "test_grid" SUB = diff.nc -V

03m_g Probar_SRTM25_03m

06m_g Probar_SRTM25_06m

01d_g Probar_SRTM25_01d

I make some simple histrogram to see the differences (gmt histogram dif_${RES}.txt -Io -Z1 -T0.1 > dif_${RES}_histogram.txt). It seems that in the difference grid there are only nodes with values of -0.5, 0 and 0.5.

Nodes of diff.nc for resolution 03m_g:

#Value  %
-0.5    1.04650064608
0   97.905301884
0.5 1.04819746987

Nodes of diff.nc for resolution 01d_g:

#Value  %
-0.5    0.0642781714391
0   99.8729740898
0.5 0.0627477387858

Question

So, What do you think? Does it make this any sense this new way? I am not very familiar about the Gaussian filter and how could affect the lower resolutions grid?

Hope to be clear.

PaulWessel commented 1 year ago

It is true that using the original data for creating 3m averages etc is mostly buying CPT time to get extremely tiny differences. You will find the 1m errors are probably at steep gradients. I just figured that doing it from the original is the safest and most reliable and reproducible way to downsample. I did not worry about it taking a few hours since we only do it 1-2 times a year at the most. So the purist in me likes to keep it that way, but I can see issues with RAM and time for so little difference being good arguments for doing it your way. I admit my solution is always "get a faster computer with more RAM!" :-).

Esteban82 commented 1 year ago

Yes, I know that is 1-2 times a year. But, the problem is that there are less persons that can have accesed to a powerful PC (and the time and willingness to do it).

anbj commented 1 year ago

Is this 1-2 times a year per dataset, or 1-2 times a year in total? Given a working recipe I would be willing to contribute hardware and power to do this.

Esteban82 commented 1 year ago

It depends on when every large dataset (SRTM, SYNBATH, GEBCO, GEBCOSI) is released. The grids are GEBCO, GEBCOSI, SRTM and SYNBATH.

Due to the Seabed 2030 project (https://seabed2030.org/) I think we can expect an update of the first 3 grids every year. Besides the processing, them you will have to upload the data (about 3.05 GB) to the server.

PaulWessel commented 1 year ago

Forgot to say this: while the orig file and the 15s_p file have very different file sizes, the grid dimensions are the same so the memory required to read any of them into memory is the same. The file is just smaller because (a) using integer units of 50 cm for the data and (b) netcdf compression. So I am not so sure you saved that much time using the 15s integer grid.

Esteban82 commented 1 year ago

Ok, I will see if I can try to run the script to see if I saved any time.

I think that another way to speed up the procees, would be to use a lower resolution grid. For example to use the earth_relief_02m to get earth_relief_01d.

I suspect that the differences would be minimal (if any).

Esteban82 commented 1 year ago

. You will find the 1m errors are probably at steep gradients.

BTW, The differences are exactly -0.5 and 0.5 (I use uniq file). In the map I use an scale from -1/1 because I wasn't sure.