Closed metemaddar closed 1 year ago
For 3, 4, 5 and 7, we could use h3._cy.parent to convert to integer parent directly: However, we still had python for loops which was slow. Using cython, we could also speed up the for loops.
By reading the bulkd_ids from hexagon files, we could reduce the data without data outside of study area, So it reduced the time of sort and unique (And improve other functions too). And also, using cython we could have large improve in reordering data:
function | before cython | after cython |
---|---|---|
Reading matrices | 751 ms | 751 ms |
sort_and_unique | 1.78 s | 1.55 s |
do_calculations | 60 ms | 40 ms |
read_hexagons | ? ~0 | ? ~0 |
tag_uniques_by_parent | 10 s | .6 s |
create_grids_unordered_map | 0 | 0 |
create_grid_pointers | 2.57 s | .4 s |
create_calculation_arrays | 13 ms | 45 ms |
create_quantile_arrays | 26 ms | 26 ms |
generate_final_geojson | 644 ms | 860 ms |
All | 16.18 s | 4.29 s |
We have a bottle neck at (Sort and unique) which seems we can not improve it unless we sort the data before caching. For this approach, we need to sort and save data per study area. This also can improve the Reading matrices part which takes around .7 seconds. So we can reduce the overall time by ~ 1.5 seconds.
As discussed, this is something we can refactor in the future.
This task/issue closed on Tue Jun 06 2023 ✅
At the moment, the read_heatmap takes about 16 seconds for resolution 6 and also 21 seconds for resolution 9, We can make it faster by:
read_opportunity_matrix()
multithread. As in this function we have to read about 18 resolutions. And these 18 read calls can get done together in multiple threads.bulk_ids
from h3 cached files of resolution 6. This can let us get rid of additive resolutions, which also contain hexagons in resolution 10 and it reduces the calculation/sorting/masking time.h3_parent()
, which can not converted to numba. So first we need to vectorize this function and then match the results in numba.grid_ids
toh3_parent()
, we need to convert the grids from integer to strings. This can get done in numba by usingint()
function of python.We could also refactor to not using dictionaries for categorizing data. We could instead use masking over red data. But this doesn't make program faster as we don't have many dictionary keys.