To speedup the process of merging the raw json data with the gsheets data source, I increase the usage of pandas when performing operations that can be vectorized.
The following changes were made to speed up get_cases:
extract_dsph_gsheet_data now returns a pandas dataframe straight.
Standardized targets input, defined in constants.py. This means we can add new ghseets columns by editing only GSHEET_TARGET_COLUMNS
supplement_data now performs the gsheets data replacements in a vectorized way with the loc method. Intersections in case_id are also now detected with sets instead of lists (more efficient),
get_cases is now supplements the data before applying the aliasing. This makes sense so the supplemented data can still be aliased.
All the elements in NONE_ALIAS will now be converted to a numpy.nan instead of just a "none" string. This allows us to utiilize the NaN methods in pandas.
Tests were updated to reflect 1
Note: 1 test fails related to phcovid_network.py. I am still working on figuring out how this happened. Hope to work with @andrewnyu to figure this out.
To speedup the process of merging the raw json data with the gsheets data source, I increase the usage of pandas when performing operations that can be vectorized.
The following changes were made to speed up
get_cases
:extract_dsph_gsheet_data
now returns a pandas dataframe straight.targets
input, defined inconstants.py
. This means we can add new ghseets columns by editing onlyGSHEET_TARGET_COLUMNS
supplement_data
now performs the gsheets data replacements in a vectorized way with theloc
method. Intersections incase_id
are also now detected with sets instead of lists (more efficient),get_cases
is now supplements the data before applying the aliasing. This makes sense so the supplemented data can still be aliased.NONE_ALIAS
will now be converted to anumpy.nan
instead of just a "none" string. This allows us to utiilize theNaN
methods in pandas.Note: 1 test fails related to
phcovid_network.py
. I am still working on figuring out how this happened. Hope to work with @andrewnyu to figure this out.