Closed mgduda closed 5 years ago
Great point. Here are the lines in question:
def nearest_cell(self, lat, lon):
nCells = self.mesh.dimensions['nCells'].size
latCells = np.array(self.mesh.variables['latCell'])
lonCells = np.array(self.mesh.variables['lonCell'])
nEdgesOnCell = np.array(self.mesh.variables['nEdgesOnCell'])
cellsOnCell = np.array(self.mesh.variables['cellsOnCell'])
sphere_radius = self.mesh.sphere_radius
We can load these variables into the mesh
instance by setting them to be a mesh attribute (self.latCell
, self.lonCell
, etc.), but if we decide to process multiple meshes with a single command we will need to be careful of how we manage memory as to not run into memory constraints. Depending on how Python manages memory and does garbage collection it may just be easier to process one mesh with a single command.
@MiCurry I think it would be reasonable to restrict the code to working on a single mesh at a single time. I don't believe there would be any advantage to processing multiple meshes with the same command; but, I may be missing some efficiency gains within the code when doing this.
@mgduda I agree. Lets just restrict it to one mesh. Looking at garbage collection as well, it didn't appear that memory was being removed after a mesh was processed. I don't think processing multiple meshes at once provides much benefit for what its worth.
I'll make two PR's, one that restricts the amount of meshes to be subsetted and then one to pre-load the needed variables for nearest_cell.
Doing this will allow us to update a number of other functions to use the pre-loaded arrays including:
@mgduda, there are a few other fields that are used elsewhere that are not used in nearest_cell (i.e. indexToCellIDs
, indexToVertexIDs
, indexToEdgeIDs
, cellsOnVertex
, cellsOnEdge
).
My thoughts is to have the needed fields for nearest_cell
loaded at the initialization of the mesh instance. Do you think it would be worth while to pre-load all of the variables that are needed throughout the generation of the regional mesh? Or just load them when they are needed?
Preloading all fields seems like it might be simpler and easier to understand than pre-loading some fields at mesh instantiation and others at a later time.
Closed by PR #14
The
nearest_cell
method in theMeshHandler
class readsnCells
,latCell
,lonCell
,nEdgesOnCell
, andcellsOnCell
(plus thesphere_radius
attribute) from a netCDFDataset
on every call. This can lead to very long runtime of themark_boundary
routine for dense meshes (e.g., the uniform 3-km mesh) and regions with many boundary points (e.g., a circle approximated by 100 points)