NathanRxl / bnp-cardif-challenge

After deadline Kaggle Competition, Master's Data Science, 2017
0 stars 0 forks source link

Improve preprocessing performance #12

Open NathanRxl opened 7 years ago

NathanRxl commented 7 years ago

I noticed the preprocessing is not very fast (it takes something like 2 minutes to run on my laptop). A quite easy way of improving it could be to store once for all the computation of categorical_features_likelihood (possibly into a json file). Once computed, the json file could be used by the script (if an argument like "use_precomputed" is passed) in order not to recompute it each time.

This could improve the performance by 35%. This issue is not a priority as soon as #2, #3 and #6 are still opened.

NathanRxl commented 7 years ago

This issue was a low priority one. The challenge ends this week, so we decided to forget about it.