RohitDhankar / DigitalCognition_OldRepo_ARCHIVED

DC is being developed to be a generic BI tool. Product preview here :- https://www.youtube.com/channel/UC9J9N9CNv15s9U9Aejpza6g/videos
2 stars 1 forks source link

Python Performance concern - Multiple methods reading same Pandas Data Frame from a particular Pickle file #22

Closed RohitDhankar closed 5 years ago

RohitDhankar commented 5 years ago

For creating summary statistics charts / plots from Bokeh and Holoviews , following steps are followed -

1/ Data set , which is chosen by the Data Analyst (end user) is culled out from the PostgreSQL DB . Lets presume we limit this data sets size to 500 rows of data .

2/ This smaller set of 500 rows - lets call it df_eda [Data frame for Exploratory data analysis ] , is then 'pickled' and stored / persisted - onto the server local drive. This pickled data set is named - 'df_holoviewPlots.pkl' . Its referred here in our code - (https://github.com/RohitDhankar/DigitalCognition/blob/17cb9e2781253f3635b46b9cc038baa382874443/dc_dash/dc_holoviews.py#L48)

3/ Reason's we pickle and persist 'df_eda' ? Whenever plots are rendered - they will need 'df_eda' as input parameter. We have no control over how many times , user wants to see Charts/ Plots .

Thus over the top - these options available , as on 11th June 19 , they all need to be performance tested -

Have raised this Python Performance concern here on SO - Multiple methods reading same Pandas Data Frame from a particular Pickle file

RohitDhankar commented 5 years ago

ClientSide - Session Storage of 'df_eda' in JSON format looks promising - need to performance test options.

RohitDhankar commented 5 years ago

As of now - using the localStorage.setItem('myObj_bokeh'... , will maybe do perf. tests for other scenarios later...

https://github.com/RohitDhankar/DigitalCognition/blob/307c3262def2926cee6a17c332a2f604a77b0b86/dc_dash/templates/dc_dash/eda_sidebar.html#L1320