davidrpugh / pyAM

Python package for solving assortative matching models with two-sided heterogeneity.
MIT License
0 stars 4 forks source link

Add some code for estimating a model using non-linear least squares given some data #6

Open davidrpugh opened 9 years ago

davidrpugh commented 9 years ago

@crisla and @Falter89

I am going to include some estimation code in the next release of pyAM. Feel free to post comments and requests for the code in this thread as I start writing the code...

crisla commented 9 years ago

@davidrpugh the widgets are super awesome!

davidrpugh commented 9 years ago

@crisla

Thanks. Sounds like you were able to update the code and get them to work. Good. Next week I will try to sketch out some estimation code and we should meet to discuss how to use it.

Falter89 commented 9 years ago

@davidrpugh
I still can't run it. I've gone as far as deleting the repository and forking it again. But nothing helps. I am in the dev-pyam mode and using the newest .py files works fine because I can input the alpha and measure without any problems. But when I actually solve the model I get the error:

/Users/juliafaltermeier/anaconda/envs/pyam-dev/lib/python2.7/site-packages/scipy/optimize/minpack.py:237: RuntimeWarning: The iteration is not making good progress, as measured by the improvement from the last ten iterations. warnings.warn(msg, RuntimeWarning)

and then the solver.result.success = FALSE

I have no further idea what could be the problem. I've updated the clone in every possible way. The notebook is exactly the one on github, I haven't change anything. Any ideas?

davidrpugh commented 9 years ago

Couple of ideas...

  1. Check that measure = 1.0 for both inputs: I have still not found a good initial condition for other measures :(
  2. Did you register your Anaconda to get an academic license? This would give you access to commercial linear algebra libraries.

I will also double check that everything works on my machine as well.

Falter89 commented 9 years ago

@davidrpugh

  1. measure = 1.0 , I haven't changed anything in the notebook
  2. I didn't have one yet, but I got one now. It still doesn't work though.

I tried updating my scipy package because the error comes up in this function. For some reason it won't upgrade it though. Could this be the problem?

davidrpugh commented 9 years ago

@Falter89

Did you try...

$ conda update scipy

from a terminal/command line to update SciPy? Also make sure that you are in your pyam-dev environment (i.e., where you installed the most recent pyam source).

Falter89 commented 9 years ago

@davidrpugh That works, but doesn't solve the problem. I've even tried deinstalling and reinstalling anaconda because I got worried changing the files before had broken something. But I still get the same error message. When I try to change the parameters it just executes the cell without an error message and really fast. solver.result.success is still FALSE. Quite similar to when the shooting solver would simply stop.

Falter89 commented 9 years ago

@davidrpugh

The problem is the alpha. If I don't input anything there it solves it fine. But you could run the code with the alpha?

davidrpugh commented 9 years ago

@Falter89

That is weird. Try the following. Set alpha and the skill bounds as follows...

# need to bracket the guess for alpha
eps = 1e-3
skill_bounds = (skill_params['loc1'] + eps, 1e2)

workers = pyam.Input(var=x,
                     cdf=skill_cdf,
                     params=skill_params,
                     bounds=skill_bounds,
                     alpha=0.005,
                     measure=1.0
                     )

then check the values of of workers.lower and workers.upper (similarly for firms) should be (assuming you haven't changed parameters..

>>> workers.lower, workers.upper
(1.0760906943430828, 14.14221152315469)
Falter89 commented 9 years ago

@davidrpugh I get these numbers too.

Falter89 commented 9 years ago

@davidrpugh The alpha in initial_polys is not the same as the alpha we define for the distributions, right? Because they are set to different values, but I guess these shouldn't have anything to do with each other.

davidrpugh commented 9 years ago

@Falter89

OK. Good. Did the plot of the initial condition render properly? Check the initial_coefs. They should look like...

{'mu': array([  9.06771872e+00,   6.03407037e+00,  -1.25924238e+00,
          4.12639809e-01,  -1.59993316e-01,   6.78119966e-02,
         -3.03746109e-02,   1.41277329e-02,  -6.75307523e-03,
          3.29559914e-03,  -1.63467836e-03,   8.21527462e-04,
         -4.17345087e-04,   2.13938880e-04,  -1.10513447e-04,
          5.74652406e-05,  -3.00528191e-05,   1.57960792e-05,
         -8.33959124e-06,   4.42038732e-06,  -2.35135170e-06,
          1.25477102e-06,  -6.71537866e-07,   3.60348153e-07,
         -1.93830438e-07,   1.04491665e-07,  -5.64453124e-08,
          3.05482294e-08,  -1.65618201e-08,   8.99307473e-09,
         -4.89092077e-09,   2.66299782e-09,  -1.45224066e-09,
          7.92031971e-10,  -4.32961255e-10,   2.35733169e-10,
         -1.29150110e-10,   6.91598931e-11,  -3.79532932e-11,
          1.83673717e-11,  -1.01036685e-11]),
 'theta': array([  2.52149328e+00,   2.83922351e-01,  -1.26226443e+00,
          7.07189763e-01,  -2.32044585e-01,   2.19939871e-02,
          3.96210313e-02,  -4.48371496e-02,   3.52835866e-02,
         -2.51717488e-02,   1.77796377e-02,  -1.30054775e-02,
          1.00435582e-02,  -8.20436282e-03,   7.01888909e-03,
         -6.20786061e-03,   5.61520655e-03,  -5.14535477e-03,
          4.76074592e-03,  -4.41826678e-03,   4.12338248e-03,
         -3.84048902e-03,   3.59637248e-03,  -3.34829667e-03,
          3.13982429e-03,  -2.91501334e-03,   2.73493197e-03,
         -2.52626940e-03,   2.37039667e-03,  -2.17229304e-03,
          2.03792243e-03,  -1.84503045e-03,   1.73034343e-03,
         -1.53656013e-03,   1.44042144e-03,  -1.23736508e-03,
          1.15934119e-03,  -9.32545237e-04,   8.73220868e-04,
         -5.87277054e-04,   5.49550690e-04])}

...and yes the two alphas are not the same. I will make a note to change the name of the initial condition parameter.

Falter89 commented 9 years ago

@davidrpugh yes, initial coefs are the same

Falter89 commented 9 years ago

@davidrpugh
It remains weird. The code works fine for alphas lower than 0.005 (I tried 0.004, 0.003, 0.002 and 0.001) but not for alpha=0.005 or higher alphas I tried. I get the same error if I input the bounds implied by alpha = 0.005 directly as bounds by the way. I don't understand why the solver would work for you and Cristina for these parameters and not for me, since it should be exactly the same code. But at least it seems to work sometimes.

davidrpugh commented 9 years ago

@Falter89

I do not know why you would get different behavior. Note that alpha is a desired quantile. I choose the alpha=0.005 quantile in order to make sure that worker skill ranges over 99% of the support of the underlying log-normal distribution. Given some alpha the exact x values representing the alpha and 1-alpha quantiles of the skill distribution are unknowns that need to be solved for.

Have a kook at the inputs.py script to see how alpha and bounds are used to generate the integration bounds...


    # set (or find!) the desired bounds
    if alpha is None:
        self.lower = bounds[0]
        self.upper = bounds[1]
    else:
        self.lower = self._find_bound(alpha * self.measure, *bounds)
        self.upper = self._find_bound((1 - alpha) * self.measure, *bounds)

def _find_bound(self, alpha, lower, upper):
    """Find the alpha quantile of the CDF."""
    return optimize.bisect(self._inverse_cdf, lower, upper, args=(alpha,))

def _inverse_cdf(self, x, alpha):
    """Inverse CDF used to identify the lower and upper bounds."""
    return self.evaluate_cdf(x) - alpha

...when you provide BOTH alpha and bounds the bounds need to bracket both unknown quantiles. Put more plainly:

bounds[0] < x_L < x_H < bounds[1]

where x_L and x_H are the unknown alpha and 1-alpha quantiles.

Hope this helps...

Falter89 commented 9 years ago

@davidrpugh Yes, makes sense and I prefer this way of choosing the bounds. My current theory is that somehow the solver has a problem with the bounds being too close together. Setting alpha lower widens the bounds, then it works. Since I had the same problem inputting the bounds implied by alpha=0.005 as bounds and setting alpha=none it shouldn't be due to the way the bounds are calculated. Why the solver should be more sensitive to the bounds when I use it remains a mystery to me.