gabors-data-analysis / da_case_studies

Codes for case studies for the Bekes-Kezdi Data Analysis textbook
MIT License
186 stars 172 forks source link

Pyfixest testing #115

Open gbekes opened 3 months ago

gbekes commented 3 months ago

Testing pyfixest Ch10

See also the R version where fixest is used.

  1. Test if results are the same
  2. Adjust output presentation
s3alfisc commented 3 months ago

Hi @gbekes, I can port the notebook over the weekend, and then we can compare if results match? Please feel free to "officially" assign me if you like =)

gbekes commented 3 months ago

Great. Btw, this should be next, ch23 panel data. Ping @adamvig96 if any Qs. Results I have in the book can be also seen on tables at public slideshow

gbekes commented 2 months ago

Hi @s3alfisc any progress? @DanielBarabas is here to scale up if all works well

s3alfisc commented 2 months ago

I'll try my best to get to it this weekend. I tried the other week but I was in a train & the internet failed me (we don't really have internet in trains in Germany...) & then I got caught up with work and PR reviews / bug fixes for pyfixest. Sorry!

gbekes commented 1 month ago

Status update. @adamvig96 went through the case studies and created pyfixest versions when applicable see it HERE

We have these problems. Could you pls advise? @s3alfisc

We went through the case studies, see it here:
https://github.com/gabors-data-analysis/da_case_studies/tree/pyfixest 

and have these problems. Could you pls advise?

case_study pyfixest issue
ch01-billion-prices-collect not needed  
ch01-hotels-data-collect not needed  
ch01-management-data-collect not needed  
ch02-football-manager-success not needed  
ch02-hotels-data-prep not needed  
ch02-immunization-crosscountry not needed  
ch03-city-size-japan not needed  
ch03-distributions-height-income not needed  
ch03-football-home-advantage not needed  
ch03-hotels-europe-compare not needed  
ch03-hotels-vienna-explore not needed  
ch03-simulations not needed  
ch04-management-firm-size not needed  
ch05-stock-market-loss-generalize not needed  
ch06-online-offline-price-test not needed  
ch06-stock-market-loss-test not needed  
ch07-hotels-simple-reg not needed  
ch07-ols-simulation not needed  
ch08-hotels-measurement-error not needed  
ch08-hotels-nonlinear not needed  
ch08-life-expectancy-income not needed  
ch09-gender-age-earnings Dani TODO  
ch09-hotels-europe-stability Dani TODO  
ch10-gender-earnings-understand Dani TODO  
ch10-hotels-multiple-reg Dani TODO  
ch11-australia-rainfall-predict not needed  
ch11-smoking-health-risk not compatible no logit no probit
ch12-electricity-temperature not compatible no HAC std errors
ch12-stock-returns-risk Ádi done  
ch12-time-series-simulations not needed  
ch13-used-cars-reg Ádi done  
ch14-airbnb-reg Dani TODO  
ch14-used-cars-log Dani TODO  
ch15-used-cars-cart Ádi done  
ch16-airbnb-random-forest not needed  
ch17-predicting-firm-exit not needed  
ch18-case-shiller-la not compatible no ARIMA VAR
ch18-swimmingpool not needed  
ch19-food-health Ádi done  
ch20-ab-test-social-media Ádi done  
ch20-working-from-home Ádi done  
ch21-ownership-management-quality Ádi done  
ch22-airline-merger-prices not compatible no R2 for WLS
ch23-immunization-life not compatible no R2 for WLS
ch23-import-demand-and-production Ádi done  
ch24-football-manager-replace Ádi done  
ch24-haiti-earthquake-gdp not needed  
s3alfisc commented 1 month ago

Hi @gbekes , thanks, very useful! For most problems, we have open issues:

These are "quick wins" and we could implement them rather quickly (timeline = within 2 weeks):

I've recently (-> yesterday) started to work on an implementation of logistic regression: https://github.com/py-econometrics/pyfixest/issues/668 This will be slightly more involved but also not an impossible task. We would start with a Probit class after implementing the Logit estimator (as lots of code can be recycled).

HAC standard errors have not been on the roadmap at all; my perception was that they are actually not so widely used in practice. I have added a PR to include them: https://github.com/py-econometrics/pyfixest/issues/675

DanielBarabas commented 1 week ago

Hi @s3alfisc, I checked the remaining chapters and all others except for ch14's two folders all require features that are not yet implemented. The updated list looks like this:

case_study pyfixest issue
ch01-billion-prices-collect not needed
ch01-hotels-data-collect not needed
ch01-management-data-collect not needed
ch02-football-manager-success not needed
ch02-hotels-data-prep not needed
ch02-immunization-crosscountry not needed
ch03-city-size-japan not needed
ch03-distributions-height-income not needed
ch03-football-home-advantage not needed
ch03-hotels-europe-compare not needed
ch03-hotels-vienna-explore not needed
ch03-simulations not needed
ch04-management-firm-size not needed
ch05-stock-market-loss-generalize not needed
ch06-online-offline-price-test not needed
ch06-stock-market-loss-test not needed
ch07-hotels-simple-reg not needed
ch07-ols-simulation not needed
ch08-hotels-measurement-error not needed
ch08-hotels-nonlinear not needed
ch08-life-expectancy-income not needed
ch09-gender-age-earnings not compatible no prediction interval and locally defined functions cannot be used inside the formula (lspline)
ch09-hotels-europe-stability not compatible locally defined functions cannot be used inside the formula (lspline)
ch10-gender-earnings-understand not compatible no prediction interval
ch10-hotels-multiple-reg not compatible locally defined functions cannot be used inside the formula (lspline)
ch11-australia-rainfall-predict not needed
ch11-smoking-health-risk not compatible no logit no probit
ch12-electricity-temperature not compatible no HAC std errors
ch12-stock-returns-risk Ádi done
ch12-time-series-simulations not needed
ch13-used-cars-reg Ádi done
ch14-airbnb-reg Dani done
ch14-used-cars-log Dani done
ch15-used-cars-cart Ádi done
ch16-airbnb-random-forest not needed
ch17-predicting-firm-exit not needed
ch18-case-shiller-la not compatible no ARIMA VAR
ch18-swimmingpool not needed
ch19-food-health Ádi done
ch20-ab-test-social-media Ádi done
ch20-working-from-home Ádi done
ch21-ownership-management-quality Ádi done
ch22-airline-merger-prices not compatible no R2 for WLS
ch23-immunization-life not compatible no R2 for WLS
ch23-import-demand-and-production Ádi done
ch24-football-manager-replace Ádi done
ch24-haiti-earthquake-gdp not needed