may you share what book you used for First-Difference Estimator

Sandy4321 commented 2 years ago

great material thanks only may you share what book you used for First-Difference Estimator https://www.youtube.com/watch?v=p9NhSrTugYM&list=PLOQU3c_3DSpLTBa0vqPFVwDCqXlXiu49j&index=55

also for all DiD 14.2) Algebra of Difference-in-Differences (DID) 14.3) Python: Diff-in-Diff (DD) 14.4) Quasi-Experiment Diff-in-Diff (DID)

what else material may be helpful to understand DiD?

and alpha_i

causal-methods commented 2 years ago

Angrist, J. D. and Pischke, J. (2014). Mastering ’Metrics: The Path from Cause to Effect, Princeton University Press.

Jeffrey M. Wooldridge (2016), Introductory Econometrics: A Modern Approach, 6th Edition, Cengage Learning.

Kamada, Vitor. (2020b). Causal Inference with Python. https://causal-methods.github.io/Book

Using Python for Introductory Econometrics by Florian Heiss and Daniel Brunner

Angrist, Joshua D. and Pischke, Jörn-Steffen (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press

Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data. 2ed, Cambridge: MIT Press

Sandy4321 commented 2 years ago

Great thanks

It would very kind of you share simple books for panel data Or videos simple to understand Additional to your videos?

On Thu, May 12, 2022, 7:47 PM Vitor Kamada @.***> wrote:

Angrist, J. D. and Pischke, J. (2014). Mastering ’Metrics: The Path from Cause to Effect, Princeton University Press.

Jeffrey M. Wooldridge (2016), Introductory Econometrics: A Modern Approach, 6th Edition, Cengage Learning.

Kamada, Vitor. (2020b). Causal Inference with Python. https://causal-methods.github.io/Book

Using Python for Introductory Econometrics by Florian Heiss and Daniel Brunner

Angrist, Joshua D. and Pischke, Jörn-Steffen (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press

Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data. 2ed, Cambridge: MIT Press

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1125522464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR42IME7UKP5N2NS2IDVJWJ7RANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

These are the simple and best books for beginners. They have great chapters of Panel Data (Fixed effects, first difference, etc.)

Real Econometrics: The Right Tools to Answer Important Questions by Michael Bailey Jan 3, 2019

Introduction to Econometrics (3rd Edition) by STOCK JAMES & W. WATSON MARK | Jan 1, 2017

Sandy4321 commented 2 years ago

great thanks can I find there machine learning classification like solutions

lets say matrix from 1000000 rows and 2000 features and 20 observations for the same id for different times ?

and target is yes or no?

On Fri, May 13, 2022 at 1:37 PM Vitor Kamada @.***> wrote:

These are the simple and best books for beginners. They have great chapters of Panel Data (Fixed effects, first difference, etc.)

Real Econometrics: The Right Tools to Answer Important Questions by Michael Bailey Jan 3, 2019

Introduction to Econometrics (3rd Edition) by STOCK JAMES & W. WATSON MARK | Jan 1, 2017

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1126288290, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR3FBGDAAUOO2BX2NODVJ2HODANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

Econometrics textbooks do no cover Machine Learning. Econometrics focus on causal inference and not forecasting. The exception is Time Series Econometrics.

If you want to see examples and solutions for your example, study the book: An Introduction to Statistical Learning https://www.statlearning.com/ It is easy to find Python code for the examples of book on Internet.

Sandy4321 commented 2 years ago

Yes exactly Panel data is timeseries data Then prediction for panel data when we have 200 features?

On Fri, May 13, 2022, 3:44 PM Vitor Kamada @.***> wrote:

Econometrics textbooks do no cover Machine Learning. Econometrics focus on causal inference and not forecasting. The exception is Time Series Econometrics.

If you want to see examples and solutions for your example, study the book: An Introduction to Statistical Learning https://www.statlearning.com/ It is easy to find Python code for the examples of book on Internet.

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1126399998, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR6UMA5YOJP2MLESD63VJ2WKXANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

Panel data has a time dimension. But Econometrics of Panel Data doesn't deal traditionally with this type of problem: prediction with 200 features. You are better off using Machine Learning textbooks. The combination of Panel Data techniques and Machine learning methods are only covered at high level technical papers. There is no simple book for beginners. You can study both techniques in separated, using different books.

Sandy4321 commented 2 years ago

Panel Data techniques and Machine learning methods are only covered at high level technical papers

I would try may you share ? it seems to be not so bad https://towardsdatascience.com/assigning-panel-data-to-training-testing-and-validation-groups-for-machine-learning-models-7017350ab86e https://towardsdatascience.com/a-guide-to-panel-data-regression-theoretics-and-implementation-with-python-4c84c5055cf8

though these is more complicated

Synth R https://cran.r-project.org/web/packages/Synth/Synth.pdf

Susanathey/MCPanel R code https://github.com/susanathey/MCPanel

Synth_inference/synthdid R code https://github.com/synth-inference/synthdid

Ebenmichael/augsynth R code https://github.com/ebenmichael/augsynth but since there is code, it is possible to learn ...

On Fri, May 13, 2022 at 4:08 PM Vitor Kamada @.***> wrote:

Panel data has a time dimension. But Econometrics of Panel Data doesn't deal traditionally with this type of problem: prediction with 200 features. You are better off using Machine Learning textbooks. The combination of Panel Data techniques and Machine learning methods are only covered at high level technical papers. There is no simple book for beginners. You can study both techniques in separated, using different books.

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1126417634, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR6KFG4S6ZEIPM6XCF3VJ2ZCBANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

1) Panel data may refer to that data structure, that is, the same entities are observed across time. 2) Another meaning is Panel methods (Econometrics estimators for causal inference, such as fixed effects, fist difference, DID, etc.

The article "Assigning Panel Data to Training, Testing and Validation Groups for Machine Learning Models" is about (1) panel data forecasting using Machine Learning Methods. It is what you learn using Machine Learning textbooks.

The article "A Guide to Panel Data Regression: Theoretics and Implementation with Python" is about (2). It is what you learn from Econometrics textbooks.

If your goal is forecasting, go for Deep Learning (Neural Network). If you want to establish causality, study econometrics. There is no reason to run a marathon with ballet point shoes or dance ballet with running shoes.

If you can read the papers of Susan Athey and implement her method, it is excellent. She has been developing methods at the intersection of Causal Inference and Machine Learning. She and her coauthors are using Machine Learning Methods to leverage the Causal Inference Methods. Fundamentally, they are attacking Causal Inference questions.

Sandy4321 commented 2 years ago

The article "Assigning Panel Data to Training, Testing and Validation Groups for Machine Learning Models" is about (1) panel data forecasting using Machine Learning Methods. It is what you learn using Machine Learning textbooks.

supper , thanks for sharing it is what I ask you may you share some github code for exactly this kind of solutions for panel data - many timed measurements for the same samples ? or books or papers ... my guess , for example it may be practicable for equipment failure prediction like https://medium.com/swlh/machine-learning-for-equipment-failure-prediction-and-predictive-maintenance-pm-e72b1ce42da1

On Sat, May 14, 2022 at 12:03 AM Vitor Kamada @.***> wrote:

Panel data may refer to that data structure, that is, the same entities are observed across time. 2) Another meaning is Panel methods (Econometrics estimators for causal inference, such as fixed effects, fist difference, DID, etc.

The article "Assigning Panel Data to Training, Testing and Validation Groups for Machine Learning Models" is about (1) panel data forecasting using Machine Learning Methods. It is what you learn using Machine Learning textbooks.

The article "A Guide to Panel Data Regression: Theoretics and Implementation with Python" is about (2). It is what you learn from Econometrics textbooks.

If your goal is forecasting, go for Deep Learning (Neural Network). If you want to establish causality, study econometrics. There is no reason to run a marathon with ballet point shoes or dance ballet with running shoes.

If you can read the papers of Susan Athey and implement her method, it is excellent. She has been developing methods at the intersection of Causal Inference and Machine Learning. She and her coauthors are using Machine Learning Methods to leverage the Causal Inference Methods. Fundamentally, they are attacking Causal Inference questions.

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1126632757, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR63GHVQ2BN74MJXRCTVJ4Q2JANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

First I would ignore the Panel Data structure and deploy Neural Network using Keras. The best book is: Deep Learning with Python, Second Edition by Francois Chollet | Dec 21, 2021

Another decent approach is xgboost. Book: Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python by Corey Wade (Author), Kevin Glynn.

If the results are unsatisfactory or/and you want to go deeper, try to integrate the Panel Data structure. Paper: Interpretable Neural Networks for Panel Data Analysis in Economics Yucheng Yang, Zhong Zheng, Weinan E

[How to process panel data for use in a recurrent neural network (RNN)] https://stackoverflow.com/questions/40008240/how-to-process-panel-data-for-use-in-a-recurrent-neural-network-rnn

Sandy4321 commented 2 years ago

great thanks though there is no material in book Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python by Corey Wade (Author), Kevin Glynn. related to panel data (tabular data with many same samples but with different time ) do you mean to use https://stackoverflow.com/questions/40008240/how-to-process-panel-data-for-use-in-a-recurrent-neural-network-rnn to convert data to tabular data and after to use any tabular data python package ?

PS deep learning is data gready , then not practicable ? like stated in https://github.com/Amplo-GmbH/AutoML When log files have to be classified, and there is not enough data for time series methods (such as LSTMs, ROCKET or Weasel, Boss, etc), one needs to fall back to classical machine learning models which work better with lower samples.

On Sun, May 15, 2022 at 12:54 PM Vitor Kamada @.***> wrote:

First I would ignore the Panel Data structure and deploy Neural Network using Keras. The best book is: Deep Learning with Python, Second Edition by Francois Chollet | Dec 21, 2021

Another decent approach is xgboost. Book: Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python by Corey Wade (Author), Kevin Glynn.

If the results are unsatisfactory or/and you want to go deeper, try to integrate the Panel Data structure. Paper: Interpretable Neural Networks for Panel Data Analysis in Economics Yucheng Yang https://arxiv.org/search/econ?searchtype=author&query=Yang%2C+Y, Zhong Zheng https://arxiv.org/search/econ?searchtype=author&query=Zheng%2C+Z, Weinan E https://arxiv.org/search/econ?searchtype=author&query=E%2C+W

[How to process panel data for use in a recurrent neural network (RNN)]

https://stackoverflow.com/questions/40008240/how-to-process-panel-data-for-use-in-a-recurrent-neural-network-rnn

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1126977928, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR65MDQYINIHMNMS7LLVKET3DANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

causal-methods commented 2 years ago

Before you said a Matrix with 1000000 rows. This is more than enough for Deep Learning.

The estimators of Panel Data use the information that we observe the same unit at a different point in time. Let's say that we observe the revenue of Microsoft over several years. The observations (rows) of Microsoft are likely to be dependent because, at the end of the day, they are observations of the same company Microsoft. This information is useful to mitigate bias, that is, to deal with endogeneity problems. This is unlikely to improve the accuracy of the forecasting. The Machine Learning algorithm is designed to maximize forecasting. Panel Data is not the typical data structure of most Machine Learning problems. Panel Data estimators are actually transforming data (time demeaning, fist difference, etc). All these transformations in data are not useful for forecasting.

Each Machine Learning algorithm needs the data in a "certain way". Whatever the way, is your job to make the modifications. Even for Panel Data estimators, you have to set (declare) the time and unity of analysis variables. In this case, you would have two columns as indices. Usually, you cannot use this data format for Machine Learning algorithms.

If you have a small sample size, use whatever Machine Learning algorithm is more appropriate.

causal-methods commented 2 years ago

Even if Panel Data, you can run the regular OLS that ignores the Panel Data Structure. In this case, each observation of Microsoft is treated as independent. Obvious the results are different. The regular OLS suppose to be biased. Roughly speaking, the Machine Learning algorithm does the same as regular OLS.

Sandy4321 commented 2 years ago

yes it is exactly what I try to find : project with code for panel data to build ML model for classification/regression to learn by example how to deal with panel data I very surprised it is very difficult to find such kind of github repository

meaning to not do this " Even if Panel Data, you can run the regular OLS that ignores the Panel Data Structure. In this case, each observation of Microsoft is treated as independent. Obvious the results are different. The regular OLS suppose to be biased. Roughly speaking, the Machine Learning algorithm does the same as regular OLS."

but proper ML solution

On Sun, May 15, 2022 at 6:37 PM Vitor Kamada @.***> wrote:

Even if Panel Data, you can run the regular OLS that ignores the Panel Data Structure. In this case, each observation of Microsoft is treated as independent. Obvious the results are different. The regular OLS suppose to be biased. Roughly speaking, the Machine Learning algorithm does the same as regular OLS.

— Reply to this email directly, view it on GitHub https://github.com/VitorKamada/ECO6100/issues/1#issuecomment-1127069281, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXFSR5HDCA5DNFKCBSVO73VKF4C5ANCNFSM5VZQ2ETA . You are receiving this because you authored the thread.Message ID: @.***>

VitorKamada / ECO6100

may you share what book you used for First-Difference Estimator #1