UrbanInstitute / education-data-package-stata

MIT License
19 stars 4 forks source link

Issue with CSV option #59

Closed ericatheresa closed 5 years ago

ericatheresa commented 5 years ago

A few people have recently encountered (possibly unrelated) errors using the csv option. @grahamimac @ddorio

One example is: educationdata using "college scorecard student-characteristics aid-applicants", sub(year=2014) csv

Error message:

variable count_total_FAFSA_applicants not found stata(): 3598 Stata returned error labelcsv(): - function returned error downloadcsv(): - function returned error getalldata(): - function returned error

: - function returned error Another is: educationdata using "school ccd enrollment race", sub(grade=99 year=2015) csv This runs, but returns fewer than 6,000 schools.
grahamimac commented 5 years ago

Thanks @ericatheresa. Tagging @VivianSihanZHENG to see if she can replicate. Three questions:

1) Do you know in which versions of the package these errors occurred? 2) Were you able to replicate these errors? 3) How many schools do we expect the second one to return if it's correct?

Thanks! Vivian please test these cases in the mean time and look into the issues.

VivianSihanZHENG commented 5 years ago

Hi @ericatheresa @grahamimac , I tested in the STATA package Version 0.3.1 (latest version), and I was able to return the results without errors. For educationdata using "school ccd enrollment race", sub(grade=99 year=2015) csv, 136080 observations are returned, and there are 5670 unique ncessch.

I think we caught this issue when running tests in the previous release, and Graham fixed it in version 0.3.1. I would recommend updating the package to the latest version, and try running them again. Thanks!!

ericatheresa commented 5 years ago

Hi @VivianSihanZHENG -- that's what I get as well. The call should return about 100,000 schools and millions of observations. It seems to only be pulling the first five states, and quits halfway through California.

ericatheresa commented 5 years ago

@VivianSihanZHENG I updated the package and that fixed the scorecard example (thanks!), but not the ccd example.

VivianSihanZHENG commented 5 years ago

hi @ericatheresa, i checked in API using https://educationdata.urban.org/api/v1/schools/ccd/enrollment/2015/grade-99/race/, and 774488 records are returned. Does this record look correct to you?

ericatheresa commented 5 years ago

@VivianSihanZHENG Yep, that sounds about right!

VivianSihanZHENG commented 5 years ago

hi @ericatheresa, I checked the original CSV file, and filter by filter(grade==99, year==2015, sex==99), and it returned 774488 records with 96811 unique ncessch IDs, which matches with what's returned in API. So I believe the underlying CSV file is correct. I will then check the STATA package programs. @grahamimac Will get it back to you asap!

VivianSihanZHENG commented 5 years ago

hi @ericatheresa, I found that the CSV file in the S3 bucket was not complete (where the STATA package csv option grabs the data), which causing fewer rows in the STATA package. The issue has been automatically resolved after I re-uploaded the CSV file to S3. And I think I was looking at the wrong bucket earlier. Sorry about the confusion! You could test it in STATA now. Please let me know if you see anything else!

The other thing I noticed is that the results from educationdata using "school ccd enrollment race", sub(grade=99 year=2015) csv still keep all values for the variable sex. Does this look correct to you? Thanks!!

ericatheresa commented 5 years ago

Thanks, @VivianSihanZHENG ! Works great now. Following up separately on issue #2.