JustinMShea / wooldridge

The official R data package for "Introductory Econometrics: A Modern Approach". A vignette contains example models from each chapter.
https://justinmshea.github.io/wooldridge
200 stars 70 forks source link

Compatibility with the 7th edition #5

Closed mhwachter closed 3 years ago

mhwachter commented 4 years ago

Hi,

is this package compatible with the 7th edition of the book?

JustinMShea commented 4 years ago

Yes. For context, the update from the 5th to the 6th edition was minimal, adding just 6 more data sets, bringing the total number of data sets from 105 to 111. In updating the 7th, I do not believe any additional data sets were added. I did reach out to the publisher during the texts pre-production period on this matter, and they were unaware of any additional data sets at that time. However, if more did get added, we can add them here as well, and update the documentation accordingly.

mhwachter commented 4 years ago

That's great. Thanks for the fast reply.

JustinMShea commented 4 years ago

Sure, glad I could help. You caught me at the desk, before 25 other alerts buried yours :)

And thanks for your query, I'll do a little more digging and make updates if needed.

VolodymyrVakhitov commented 4 years ago

Hi Justin,

Cengage holds a book companion site: https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781337558860

They have several versions of the data sets, for some reason. Apparently, the one in Stata is the fullest, it has 115 files, whereas the one in R, for example, has only 111 files. The one in Excel has only 108 files, but as a plus, they have descriptions for each of the data files.

In particular, the file JTRAIN98, which is used in a newly added section 3.7 in the 7th edition is missing. I believe the other four files are used in some new computer assignments or newly added sections and would be quite useful too.

Anyway, could you, please, use the Stata version of the data (http://academic.cengage.com/resource_uploads/downloads/1337558869_635109.zip) and update the Wooldridge package, please? It's really a bomb for teaching!

Thanks a million!

Best wishes, Volodymyr

P.S. After a little search, here are the files that are missing:

JTRAIN98 LABSUP NCAA_RPI SCHOOL93_98

V.

JustinMShea commented 4 years ago

Thanks Volodymyr,

I left the issue open hoping someone else might look into this, so your comments and work here are most appreciated!

Indeed, the wooldridge package does use the Stata files as it's source, not the R files. These were several good reasons for this, and it appears Stata files are Jeffrey Wooldridge's storage medium of choice. The R, excel, and other files appear to be more sparsely maintained by graduate students and have other issues besides being incomplete in number. Due to this, I wrote .R scripts to convert the Stata files to RData files, so I should be able to add these over the upcoming U.S. holidays.

Other than the ease of loading data, a key feature of the package is the included documentation for each data set. The textbook doesn't always fully describe the data set one is working on, so I find this context very helpful for students. For jtrain, just type ?jtrain in the console and the R Documentation file will launch with all information documented in the Wooldridge Data Set Handbook. Fortunately, I was able to locate the 7th edition Data Set Handbook under the instructor section of the Cengage site. I will include that too, which will also help partially solve issue #3.

Thanks again and I'm glad you find the package useful, Justin

calciferanalytics commented 3 years ago

Hi Justin, first of all, thank you for maintaining this package, it makes it much easier to teach Econometrics using R. The JTRAIN98 dataset that VolodymyrVakhitov is referring to is not the same as the jtrain dataset, so ?jtrain is of no help. Unfortunately, the JTRAIN98 dataset is only in STATA format. I solved the issue by importing the JTRAIN98.dta stata file (from the link that VolodymyrVakhitov provided) to R using the haven package and then write.csv() to save it as a csv file. Attaching the archive of the csv file to this comment for anyone who ventures here scavenging for a solution of the same issue. jtrain98.csv.zip

I hope it's helpful. Thank you, Aram

JustinMShea commented 3 years ago

@calciferanalytics Hi Aram, Thank you, I'm glad you find the package useful. Yes, I am aware the jtrain set is not what @VolodymyrVakhitov was referencing and that the full data set for the 7th edition is in Stata format. Having worked with Stata .dta files to upgrade from the 5th to the 6th edition, I updated the scripts for the 7th. Your can read them here if interested : data-raw/data-cleaning-7th.R file added at b1dab186ac0b66e65f5fee09a58634c598552daf

I prefer to use the readstata13 package, which is a much more stable, faster, and consistent than haven, so I would recommend giving it a try if you load Stata files into R often!

Justin

JustinMShea commented 3 years ago

Thank you for bringing this to my attention @MarcelHWachter and thank you @VolodymyrVakhitov, for doing a little leg work to make this update a bit less time consuming! This will work right away:

remotes::install_github("JustinMShea/wooldridge")

I will likely wait to fully tackle issue #3 before pushing the update to CRAN, but that should be soon.

Thanks again, Justin