bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
941 stars 184 forks source link

MemoryError with IV2SLS #607

Open ckarren opened 4 months ago

ckarren commented 4 months ago

I'm trying to run a 2SLS to estimate price elasticity with IV2SLS. This is what my data looks like: | ln_q | ln_p | .... weather variables ... | ... instruments ... |... user id dummies ...|

all data is np.float32. My data array is approx. (200000, 20000) which is about 16GB.

Using linearmodels IV2SLS I set up my model like:

dependent = ln_q endog = weather variables + user id dummies exog = ln_p instruments = instruments results = IV2SLS(dependent, endog, exog, instruments).fit()

When running with the full dataset I consistently get the error: Unable to allocate 27.8GiB of memory to an array with shape (202507, 18450) and data type float64 and it looks like this line is the culprit:
self._wz = self._z * w which is where weights are assigned. I'm running 64-bit python on a machine with 128 GB of RAM. I've tried to circumvent this issue by passing my own weights: results = IV2SLS(dependent, endog, exog, instruments, weights=np.ones(dependent.shape, dtype=np.float32)).fit() but still get the same MemoryError even when I explicitly pass my own weights of data type float32. 32 GB of RAM usage just to create an array of 1s when weights = None seems like an awful lot of memory usage to essentially keep the input values unchanged. Further, why is it getting recast to float64, when all my other data is of data type float32 and I explicitly pass weights of datatype float32? Why is an array of ~16GB using >100GB of RAM in this process? What can I do to get this regression to run?