UCL / pyCascadia

Implementation of GEBCO cookbook remove-restore and other cleaning of topography/bathymetry. Uses `pyGMT`.
Mozilla Public License 2.0
9 stars 0 forks source link

Writing temporary intermediate grids to disk #60

Closed Devaraj-G closed 3 years ago

Devaraj-G commented 3 years ago

How important is to write resampled.nc, original.nc, diff.nc and diff.xyz and others (?) to disk after an update? When there are large number of input files, this creates significant I/O overhead.

JamieJQuinn commented 3 years ago

Unfortunately it's how we use GMT when the appropriate functionality isn't in pyGMT... Nothing we can do without touching pyGMT.

Devaraj-G commented 3 years ago

Ah ok! This means that the number of bathymetry files should be reduced as much as possible. gdal_merge.py seems to be the option.

alessandrofelder commented 3 years ago

Not sure if it matters, as pyGMT typically does a quite similar thing under the hood, I think? We'll write small amount of data many times or a large amount once, but the data that needs writing will be the same?

JamieJQuinn commented 3 years ago

Actually, there is an alternative. It seems like pyGMT has an API (see the documentation) to talk to GMT through virtual files. This could remove the need for these temporary files but I'm unsure if I can get it working today.

@alessandrofelder I think the virtual files thing somehow removes the need to actually write anything to disk. Unsure how it actually works but I imagine you can trick GMT into reading from RAM instead.

JamieJQuinn commented 3 years ago

The performance-critical part (the call to nearneighbour) now operates identically to other pyGMT commands. This will be a formal part of PyGMT once the pull request is merged.