cranedroesch / panelNNET_preCpp

Semiparametric panel data models using neural networks
7 stars 3 forks source link

Classification Response #1

Open miller-moore opened 7 years ago

miller-moore commented 7 years ago

Hello,

So far I see only regression loss incorporated. I'm curious if you plan to incorporate class response variable(s) via perhaps adjusting output activation using sigmoid or tanh. For example, each observation is a customer-date and the response is 1 if sales increased over some future time period compared to some past time period, otherwise 0.

Thanks, Miller

cranedroesch commented 7 years ago

Hi Miller,

Thanks for your interest.

You're right that I haven't dealt with classification problems at all.
It'd be a fairly straightforward extension, except that the method of projecting out fixed effects wouldn't work nonlinear models, which classification models are. The extension to longitudinal classification problems would require the implementation of something like either random effects or conditional likelihood approaches.

Were you specifically interested in longitudinal classification problems? If you don't have that sort of structure to your data, perhaps other well-developed packages would meet your needs.

Cheers, Andrew

On 04/19/2017 03:13 PM, Miller Moore wrote:

Hello,

So far I see only regression loss incorporated. I'm curious if you plan to incorporate class response variable(s) via perhaps adjusting output activation using sigmoid or tanh. For example, each observation is a customer-date and the response is 1 if sales increased over some future time period compared to some past time period, otherwise 0.

Thanks, Miller

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cranedroesch/panelNNET/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AM1cZNVSnASNPjYwtbOlKWp_ms9xmIeGks5rxlzkgaJpZM4NCHe3.

miller-moore commented 7 years ago

Hey Andrew,

I am specifically interested in longitudinal classification problems using non-linear methods. A non-linear extension to panel data, regardless of regression or classification tasks, is scarce if non-existent. Other options are constrained to the case of balanced panel data. Some techniques appear available by cursory search, but seem to require complicated data preparation and steep learning curves.

For the linear hypothesis, there are two packages available in R, one purely for regression (plm) and another (pglm) that extends or "generalizes" the plm functionality to logit, probit, negbin, and poisson responses. The problem with both of course is that they are constrained to the linear combination of variable effects, group effects, between effects, and errors whether or not random or fixed effects is assumed. For panel data that is large in both n and p, linear models suffer because of intractable matrix algebra. A work around could be obtained using meta-approaches to feature selection, but at high computational cost. Even still, the linear hypothesis is likely to underperform a non-linear alternative when data is large enough to support it. Lastly, there is cstem, which I just recently came across and cannot speak to.

At a glance, your package seems to offer the best of both worlds by allowing data to stay in panel format and offering a non-linear framework by way of neural network. However, I trust your answer that the current architecture wouldn't support classification of longitudinal data, though I don't understand the reason because I'm passing my level of knowledge at this point. Do you have a quick explanation about why a logistic loss function couldn't drive backprop in your current architecture for longitudinal data? Is it because your goal is to always produce fixed effects per individual for descriptive purposes, even if the objective were maximizing predictive performance? In the latter case, it seems to me in my naivety that allowing backprop to drive cross-network connections, while respecting time order, is still on the table with your architecture.

Much appreciated, Miller

On Wed, Apr 19, 2017 at 5:05 PM, cranedroesch notifications@github.com wrote:

Hi Miller,

Thanks for your interest.

You're right that I haven't dealt with classification problems at all. It'd be a fairly straightforward extension, except that the method of projecting out fixed effects wouldn't work nonlinear models, which the top level of a neural network is. The extension to longitudinal classification problems would require the implementation of something like either random effects or conditional likelihood approaches.

Were you specifically interested in longitudinal classification problems? If you don't have that sort of structure to your data, perhaps other well-developed packages would meet your needs.

Cheers, Andrew

On 04/19/2017 03:13 PM, Miller Moore wrote:

Hello,

So far I see only regression loss incorporated. I'm curious if you plan to incorporate class response variable(s) via perhaps adjusting output activation using sigmoid or tanh. For example, each observation is a customer-date and the response is 1 if sales increased over some future time period compared to some past time period, otherwise 0.

Thanks, Miller

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cranedroesch/panelNNET/issues/1, or mute the thread https://github.com/notifications/unsubscribe- auth/AM1cZNVSnASNPjYwtbOlKWp_ms9xmIeGks5rxlzkgaJpZM4NCHe3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cranedroesch/panelNNET/issues/1#issuecomment-295442387, or mute the thread https://github.com/notifications/unsubscribe-auth/AHunAc5LIywG2FNgvDbH4Dqqw82UM-aoks5rxncogaJpZM4NCHe3 .

cranedroesch commented 7 years ago

panelNNET uses the method of alternating projections (see the lfe package vignette) to remove fixed effects from the top level of the neural net, which is a linear model. This is a generalization of FWL projection, which doesn't work in nonlinear models. There are other approaches to dealing with unobserved cross-sectional heterogeneity. Random effects are the most important of these. Finding a good, computationally-cheap way of implementing random effects within panelNNET would be a really useful contribution. Random effects don't control for unobserved time-invariant heterogeneity, but this only matters in situations where inference is desired on marginal effects. For pure prediction, random effects will be more efficient than fixed effects. In fact, random effects can be shown to be equivalent to a L2-penalized fixed effects estimator -- admitting bias in order to reduce variance. And they are admissible in nonlinear models, which all classification models are. (They may be linear in their link function, but not in outcomes).

While I would like to do this, it's not currently at the top of my list. And it'd require more thought than simply calling the demeanlist function from lfe.

Would you like to take this on?

miller-moore commented 7 years ago

Andrew,

Thanks for the background material. Is it only the top level of your network that is treated under the fixed effects regime?

In any case, the problem with maintaining computational efficiency remains. I am only familiar with gradient descent methods for solving nonlinear MLE and not familiar enough with the algorithm tricks to cleverly design it for speed. While I'd love to grow my knowledge through practice, I'd be very slow at best to get a working solution and even more so to achieve an efficient one.

My original curiosity with your package lay with the network architecture. My hope was that the architecture itself was allowed to change dynamically in a way analogous to generalized method of moments, but also allow for classification responses. I think backprop with familiar loss functions at the top level could address this in general, but not very efficiently. Based on my knowledge, the nearest technique available would be a tensor recurrent network to drive purely randomized linkages, dropout, gate controls, etc (except peaking ahead in time like in the demeaned case), which would also allow for estimating regressor effects specific to each individual (within/fixed), across time (between), and across individuals from prior time periods but not across individuals in a given cross-section. The last part of that statement is of particular interest to me for two reasons: (1) in most systems, what happens to one individual in the past may dynamically or causally effect another in the future; and (2) what happens to one individual in the present can never influence another individual in the present because everything must first flow through time.

I'm not aware of any methods that exist to handle this, but perhaps it could be arranged in one or more forms of random effects. A modelling scheme that can handle the last piece would be useful in the panel data is suspected to fully or almost fully represent a closed system in space at each point in time but not necessarily across time (cross-sectional conservation of population mass but random, unobserved mass leakages and transfers across time and across subjects).

Sorry for the brain vomit, -Miller

On Apr 19, 2017 9:14 PM, "cranedroesch" notifications@github.com wrote:

panelNNET uses the method of alternating projections (see the lfe package vignette) to remove fixed effects from the top level of the neural net, which is a linear model. This is a generalization of FWL projection https://en.wikipedia.org/wiki/Frisch%E2%80%93Waugh%E2%80%93Lovell_theorem, which doesn't work in nonlinear models. There are other approaches to dealing with unobserved cross-sectional heterogeneity. Random effects are the most important of these. Finding a good, computationally-cheap way of implementing random effects within panelNNET would be a really useful contribution. Random effects don't control for unobserved time-invariant heterogeneity, but this only matters in situations where inference is desired on marginal effects. For pure prediction, random effects will be more efficient than fixed effects. In fact, random effects can be shown to be equivalent to a L2-penalized fixed effects estimator -- admitting bias in order to reduce variance. And they are admissible in nonlinear models, which all classification models are. (They may be linear in their link function, but not in outcomes).

While I would like to do this, it's not currently at the top of my list. And it'd require more thought than simply calling the demeanlist function from lfe.

Would you like to take this on?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cranedroesch/panelNNET/issues/1#issuecomment-295526471, or mute the thread https://github.com/notifications/unsubscribe-auth/AHunAbhFQNelGTknoMdKvabA0t-FvaPyks5rxrFcgaJpZM4NCHe3 .

cranedroesch commented 7 years ago

Hi Miller,

Perhaps read the current version of my working paper. It's linked here. It might help you to understand how I'm treating the longitudinal structure of the data, and how I'm augmenting traditional gradient descent.

Let me know if this clarifies anything for you.

Andrew