MDKempe commented 8 months ago

We need to create a geospatial down-selection function that will be comprehensive.

The input would be a list of, GID, lat/lon, altitude and altitude. The output would be the shortened list with just the GID or other identifier as appropriate/selected.

We want to be able to select the number of points to include in the final list.

To get more useful data, we would want to account for topology where preference for data next to or in mountains would be preferentially selected. This could be accomplished by a nearest neighbor search where a weighted number is calculated based on the altitude difference between the nearest neighbors. Then all the points are randomly included with this weighted probability. This method does rely on there being a statistically large enough number of data points.

We would also want to determine the perimeter locations or locations near an ocean or large lake and try to make sure there is a good outline. This could be done through a point search that looks for a direction where there are no data points in a ~150 degree cone for a specified number of miles. The number of miles would be determined by looking at the typical spacing (e.g. 4 km) determined by a few random tests of nearest neighbors, and then just multiplying that distance by say a factor of 10. Then you would, for example, look for points where there is a direction with nothing for 40 km. Then you put all the edge points into a sublist and down select with half the rate of exclusion.

These calculations may take some time, but would create nice lists to make the subsequent calculations much better.

tobin-ford commented 6 months ago

I have implemented a simple version of this on dev_scenario geospatial. Can select for coastline, mountains, rivers from geospatial metadata dictionary. Nearest neighbor search using sklearn kdtrees (scipy has them too but sklearn is much faster).

tobin-ford commented 4 months ago

NREL / PVDegradationTools

Geospatial data down selection function #79

97