Closed metemaddar closed 1 year ago
We can do it through such diagram and use merge dataframes:
However after implementing this, we see creating dataframes for (grid_ids/calculations/quantiles) from their dictionaries takes a long time. As @EPajares suggested we can do it using numpy arrays. I need to figure out the data-structure for this. As the geometry is a table itself, and also we need to create a flowchart to use the power of numba. Because, as the shape of output can change, the numba can face exception for return type.
We could also try to save the hexagon geoms directly as geodataframe instead of geojson. So we don't need to parse the geojson again using geopandas. We could save them as pickle maybe?
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html
Yes exactly. And we should also consider the converting calculations to dataframes, which took about 90 seconds! :snail: :cold_face:
If I get it well, the data from read_heatmap are in resolution 10. As the request resolution can be different, (For example we loaded resolution 8) then we need to convert the resolution of data from read_heatmap to their grand parent resolution of 8. And also we need to aggregate data (sum, smallest, etc... for different value kinds). Then we have the resolution match. At this time we should do the calculations and continue to generate the last GEOJson. Is this true? @EPajares
I would do the calculation always on 10 and then group it though by average to the target resolution e.g. 8. After it is grouped we will perform the quintile classification. After this we have both index and class on resolution 8.
After this we append them as attributes to the geometries
The save method had changed to save hexagons(grids, polygons) as numpy arrays. Now we are going to create the calculations array based on hexagon_grids.
I think we can use sparse matrix to reorder calculations_array
The data which was red from cache, includes some neighbors that are not in study area. We need to omit these calculated data while writing to the final dictionary (Because they don't match the study_area hexagons). In this picture, the purple hexagons are covering study_area and the orange hexagons are neighbors outside of study_area.
At the moment we just need to mask indexes and calculations at the same time:
This task/issue closed on Tue Jun 06 2023 ✅
At the end we need a table like this:
We have hexagons saved as geojson and quantiles and calculations as dictionaries and they the quantiles/calculations have another dictionary, named uniques which contain the hexagon_ids. These can get connected together.