ai4er-cdt / geograph

GeoGraph provides a tool for analysing habitat fragmentation and related problems in landscape ecology. GeoGraph builds a geospatially referenced graph from land cover or field survey data and enables graph-based landscape ecology analysis as well as interactive visualizations.
https://geograph.readthedocs.io
MIT License
39 stars 10 forks source link

Performance-improvement: Combine boolean masks #39

Open Croydon-Brixton opened 3 years ago

Croydon-Brixton commented 3 years ago

If numpy does short-circuit evaluation on these things this it'd be slightly faster to combine boolean masks.

Does anyone know how numpy handles these type of cases (below)?

Case: Case select_from_array[np.logical_or(condition_array1, condition_array2)] Does it first evaluate both condition_array1 and condition_array2 in the slice [ ... ] and then or the conditions (in which case it'd probably be slower bc we would calculate the geometry overlaps for shapes which won't agree in class label). Or does it calculate the first element of condition_array1 and then short-circuit decide if that element of condition_array2 even needs to be calculated? (in which case I think it should be slightly faster)

_Originally posted by @Croydon-Brixton in https://github.com/ai4er-cdt/gtc-biodiversity/pull/28#discussion_r586369067_

herbiebradley commented 3 years ago

I did some tests and it looks like numpy does not short-circuit:

shortc

This issue seems to confirm it: https://github.com/numpy/numpy/issues/3446

However, there may be performance improvements by switching to np.where or bigger ones from using numba, as tested here https://stackoverflow.com/questions/58422690/filtering-a-numpy-array-what-is-the-best-approach