UCL / pyCascadia

Implementation of GEBCO cookbook remove-restore and other cleaning of topography/bathymetry. Uses `pyGMT`.
Mozilla Public License 2.0
9 stars 0 forks source link

Think about how to best handle large files #21

Closed alessandrofelder closed 3 years ago

alessandrofelder commented 3 years ago

Resampling the GEBCO dataset to NONNA_10 resolution for remove-restore would lead to prohibitively large datasets, filled with unnecessary data (interpolated data in the deep sea, where triangles are big). We should discuss what the best strategy going forwards is. Below are some thoughts in this direction:

  1. Approach A: merged bathymetry only in several user-defined regions: Currently, the region is hardcoded into the code, but this could be easily made an input parameter. When the user specifies several source files, we should allow to specify an output region and resolution for each. The output would be then many netcdf files, one per input source. This would avoid outputting the entire subsampled GEBCO dataset.

  2. Approach B: Merge the base + all source into the same pandas Dataframe of similar, but never write it to file. Instead pass coordinates of interest as input parameter, and only evaluate and output the bathymetry at those.

Maybe approach A is easier?

JamieJQuinn commented 3 years ago

Can this be closed?

alessandrofelder commented 3 years ago

I think so. The final approach to this is to define the ROI in remove restore and then merge the meshes further down in the pipeline if I understand correctly.