MPAS-Dev / MPAS-Limited-Area

Python tool to create a regional subset of a global MPAS Mesh
http://mpas-dev.github.io/
21 stars 10 forks source link

nearest_cell method reads fields on every call #13

Closed mgduda closed 5 years ago

mgduda commented 5 years ago

The nearest_cell method in the MeshHandler class reads nCells, latCell, lonCell, nEdgesOnCell, and cellsOnCell (plus the sphere_radius attribute) from a netCDF Dataset on every call. This can lead to very long runtime of the mark_boundary routine for dense meshes (e.g., the uniform 3-km mesh) and regions with many boundary points (e.g., a circle approximated by 100 points)

MiCurry commented 5 years ago

Great point. Here are the lines in question:

 def nearest_cell(self, lat, lon):
        nCells = self.mesh.dimensions['nCells'].size
        latCells = np.array(self.mesh.variables['latCell'])
        lonCells = np.array(self.mesh.variables['lonCell'])
        nEdgesOnCell = np.array(self.mesh.variables['nEdgesOnCell'])
        cellsOnCell = np.array(self.mesh.variables['cellsOnCell'])
        sphere_radius = self.mesh.sphere_radius

We can load these variables into the mesh instance by setting them to be a mesh attribute (self.latCell, self.lonCell, etc.), but if we decide to process multiple meshes with a single command we will need to be careful of how we manage memory as to not run into memory constraints. Depending on how Python manages memory and does garbage collection it may just be easier to process one mesh with a single command.

mgduda commented 5 years ago

@MiCurry I think it would be reasonable to restrict the code to working on a single mesh at a single time. I don't believe there would be any advantage to processing multiple meshes with the same command; but, I may be missing some efficiency gains within the code when doing this.

MiCurry commented 5 years ago

@mgduda I agree. Lets just restrict it to one mesh. Looking at garbage collection as well, it didn't appear that memory was being removed after a mesh was processed. I don't think processing multiple meshes at once provides much benefit for what its worth.

I'll make two PR's, one that restricts the amount of meshes to be subsetted and then one to pre-load the needed variables for nearest_cell.

Doing this will allow us to update a number of other functions to use the pre-loaded arrays including:

@mgduda, there are a few other fields that are used elsewhere that are not used in nearest_cell (i.e. indexToCellIDs, indexToVertexIDs, indexToEdgeIDs, cellsOnVertex, cellsOnEdge).

My thoughts is to have the needed fields for nearest_cell loaded at the initialization of the mesh instance. Do you think it would be worth while to pre-load all of the variables that are needed throughout the generation of the regional mesh? Or just load them when they are needed?

mgduda commented 5 years ago

Preloading all fields seems like it might be simpler and easier to understand than pre-loading some fields at mesh instantiation and others at a later time.

MiCurry commented 5 years ago

Closed by PR #14