Amartime / Logistic-Regression

Four-Parameter Logistic Regression for machine learning. There is also code for a two-paramter model and cross-validation in order to test which model leads to more reliable prediction for a given dataset. Note, this was for a research project and may not be heavily noted as it was not initially intended for public use.
0 stars 0 forks source link

Using scipy's genetic algorithm for initial parameter estimation in non-linear optimization #1

Closed zunzun closed 5 years ago

zunzun commented 7 years ago

I see you are writing Python code for non-linear optimization with scipy. The default starting point for scipy's non-linear solvers is to have all parameter values equal to 1.0 - and for many functions the choice of initial parameters are key to successful optimization, particularly when fitting more complex equations. For this reason, the authors of scipy have added a genetic algorithm for initial parameter estimation for use with their non-linear solvers. The module is named scipy.optimize.differential_evolution, which uses the Latin Hypercube algorithm to ensure a thorough search of parameter space.

I have used scipy's Differential Evolution genetic algorithm to determine initial parameters for fitting a double Lorentzian peak equation to Raman spectroscopy of carbon nanotubes and found that the results were excellent. The GitHub project, with a test spectroscopy data file, is:

https://github.com/zunzun/RamanSpectroscopyFit

If you have any questions, please let me know. My background is in nuclear engineering and industrial radiation physics, and I love Python, so I will be glad to help.

James Phillips

Amartime commented 5 years ago

Hello James,

I apologize for the late response. This was code from my Master's thesis, and I haven't been diligent in monitoring the repository to say the least. I may one day return to further flesh out this repository, but I have been developing other tools which are sadly private instead.

You are absolutely correct that this is the convention in default guesses for parameters, and I absolutely encourage you to take the code and adjust it to enforce these defaults if you wish. However, I would warn that a default guess of 1.0 for all parameters will lead to significant issues with the 4 parameter logistic regression model. This is due to the fact of the main benefit behind this model being is an adjustment to the asymptotes which the 2 parameter model forces to 0 and 1. A starting guess of 1 for both the lower and upper asymptotes will cause the model to potentially flip (obnoxious with graphing) or struggle to converge. As a result, the code allows for beginning guesses at these parameters via the param0 argument.

I was not aware of the initial parameter estimation which is definitely an awesome idea for generating these initial guesses I hadn't previously considered! I will definitely look into it further if I return to make this repository more than an archive of my research!

Thank you for your input, and I would always welcome any collaboration in the future! Andrew Artime