UrbanInstitute / education-data-package-r

https://urbaninstitute.github.io/education-data-package-r/
Other
85 stars 11 forks source link

API slow? #104

Open dcaud opened 2 years ago

dcaud commented 2 years ago

The below is taking several minutes and is ultimately only transferring less than 10mb of data:

  resp_df <- educationdata::get_education_data(
    level = "schools",
    source = "ccd",
    topic = "enrollment",
    filters = list(year = 2010, grade = 99),
    subtopic = list("race"),
    add_labels = TRUE
  )

Can this be sped up?

erika-tyagi commented 2 years ago

Hi @dcaud - thanks for flagging! When I ran that call, it took about 40 seconds. I'm guessing it was slower for you because the response was cached at the API layer just now (but wasn't when you first ran it) and/or the API was experiencing unusually high demand when you reported this.

Another option you could try is to pass the csv flag into the function call, which will read from the full grade/sex/race disaggregated enrollment data as a CSV (rather than from the API):

resp_df <- educationdata::get_education_data(
    level = "schools",
    source = "ccd",
    topic = "enrollment",
    filters = list(year = 2010, grade = 99, sex = 99),
    subtopic = list("race"),
    add_labels = TRUE, 
    csv = TRUE
) 

Note that you'll have to filter to sex = 99 with the csv flag to match your original output.

Benchmarking just now, the csv option took about 70 seconds (i.e. slower than the 40 seconds I saw from the original call, but faster than the several minutes you reported).

I'll do some digging to see if there are other ways we can optimize that particular endpoint at the API level, and sorry I don't have a more satisfying answer for you!