Dlux804 / McQuade-Chem-ML

Development of easy to use and reproducible ML scripts for chemistry.
5 stars 1 forks source link

Analysis module fits model too many times #23

Closed Dlux804 closed 4 years ago

Dlux804 commented 4 years ago

Please describe the inefficiency that you would like addressed. Is it causing a bottleneck? core/analysis.py has multiple functions that fit the model in order to do the analysis. It causes the model to be fit many times (unnecessarily) which is expensive for large models. It creates a post-tuning bottleneck.

multipredict() and replicate_model() both fit the model n times. This causes the model to be fit 2*n times, instead of the necessary n.

Describe the solution you'd like Optimize the functions in analysis.py so that the model is fit a minimum of times but still retrieve the necessary information to create our desired analyses. Combine multipredict() and replicate_model() so that they both access the information produced from the n fits.

Describe alternatives you've considered Haven't considered many. I think it's a clear path.

Additional context When creating the fix, consider additional analyses we may perform (i.e importance graphs).