SafeGraphInc / safegraph_py

Python code for common, repeatable data wrangling and analysis of SafeGraph data
Apache License 2.0
27 stars 15 forks source link

Massive memory usage optimization #27

Closed bpblakely closed 4 years ago

bpblakely commented 4 years ago

Optimized fast functions by only feeding the parallelized functions the necessary data required for the computation. Then merging the original dataframe afterwords (if required).

Note: Line 228 might be unnecessary, since unpack_json_fast already reduces the size of the dataframe. I don't have a clean way to measure the impact of memory usage during computation.

bpblakely commented 4 years ago

@ryanfoxsquire Fixed the merge for explode_json_array_fast. Should be good to go.