goat-community / goat

This is the home of Geo Open Accessibility Tool (GOAT)
GNU General Public License v3.0
92 stars 49 forks source link

CRUD Heatmap: Combine hexagons, calculations and quantiles #1873

Closed metemaddar closed 1 year ago

metemaddar commented 1 year ago

At the end we need a table like this:

id supermarket supermarket_class bus_stop bus_stop_class geom

We have hexagons saved as geojson and quantiles and calculations as dictionaries and they the quantiles/calculations have another dictionary, named uniques which contain the hexagon_ids. These can get connected together.

metemaddar commented 1 year ago

We can do it through such diagram and use merge dataframes:

Image

However after implementing this, we see creating dataframes for (grid_ids/calculations/quantiles) from their dictionaries takes a long time. As @EPajares suggested we can do it using numpy arrays. I need to figure out the data-structure for this. As the geometry is a table itself, and also we need to create a flowchart to use the power of numba. Because, as the shape of output can change, the numba can face exception for return type.

EPajares commented 1 year ago

We could also try to save the hexagon geoms directly as geodataframe instead of geojson. So we don't need to parse the geojson again using geopandas. We could save them as pickle maybe?

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html

metemaddar commented 1 year ago

Yes exactly. And we should also consider the converting calculations to dataframes, which took about 90 seconds! :snail: :cold_face:

metemaddar commented 1 year ago

If I‌ get it well, the data from read_heatmap are in resolution 10. As the request resolution can be different, (For example we loaded resolution 8) then we need to convert the resolution of data from read_heatmap to their grand parent resolution of 8. And also we need to aggregate data (sum, smallest, etc... for different value kinds). Then we have the resolution match. At this time we should do the calculations and continue to generate the last GEOJson. Is this true? @EPajares

EPajares commented 1 year ago

I would do the calculation always on 10 and then group it though by average to the target resolution e.g. 8. After it is grouped we will perform the quintile classification. After this we have both index and class on resolution 8.

EPajares commented 1 year ago

After this we append them as attributes to the geometries

metemaddar commented 1 year ago

The save method had changed to save hexagons(grids, polygons) as numpy arrays. Now we are going to create the calculations array based on hexagon_grids.

Image

I think we can use sparse matrix to reorder calculations_array

metemaddar commented 1 year ago

The data which was red from cache, includes some neighbors that are not in study area. We need to omit these calculated data while writing to the final dictionary (Because they don't match the study_area hexagons). In this picture, the purple hexagons are covering study_area and the orange hexagons are neighbors outside of study_area.

Image

metemaddar commented 1 year ago

At the moment we just need to mask indexes and calculations at the same time:

Image

p4b-bro[bot] commented 1 year ago

This task/issue closed on Tue Jun 06 2023 ✅