RTICWDT / open-data-maker

make it easy to turn a lot of potentially large csv files into easily accessible open data
Other
64 stars 32 forks source link

Leading zeros dropping in JSON requests #65

Closed doug2mac closed 5 years ago

doug2mac commented 5 years ago

I have noticed while accessing data via the API that for certain values, leading zeros (like those found in the ope6_id and ope8_id) are dropped from the resulting JSON. However, other values

Is there a way to keep this from happening?

Thank you

kynetiv commented 5 years ago

Hi @doug2mac, do you have an example query or schools that you noticed this happening with that you can share?

I did a quick scan on the most recent data file and I'm not seeing any leading zeros for those fields you mention. ope6_id & ope8_id are casted as integers in the data dictionary so I would be curious if you came across any that had leading zeros.

doug2mac commented 5 years ago

Hey @kynetiv,

Sure thing, I think that may be the issue as casting these values as integers (rather than strings) for these elements are truncating the 0's. Here are the first 3 values for unitid in the raw data extract from here

UNITID OPEID OPEID6 INSTNM
100654 00100200 001002 Alabama A & M University
100663 00105200 001052 University of Alabama at Birmingham
100690 02503400 025034 Amridge University

The resulting json request from pulling those same 3 unitid values shows the following data for their respective data (query = 'https://api.data.gov/ed/collegescorecard/v1/schools.json?api_key={mykey}&id={unitid}&_per_page=100'):

  1. Unitid 100654
    1. "ope8_id": 100200
    2. "ope6_id": 1002
    3. "id": 100654
  2. Unitid 100663
    1. "ope8_id": 105200
    2. "ope6_id": 001052
    3. "id": 100663
  3. Unitid 100690
    1. "ope8_id": 2503400
    2. "ope6_id": 25034
    3. "id": 100690

Checking these unitid values against IPEDS via the following link I see the full 8 and 6 digit values.

kynetiv commented 5 years ago

Thanks for the feedback. I've raised this issue internally and this will likely be resolved (cast to strings) in the next data refresh.

kynetiv commented 5 years ago

The College Scorecard API was updated on May 21, 2019 with a corrected data dictionary. The API now casts both ope8_id & ope6_id values as strings, which prevents the leading zeros from being dropped. Thanks again for the feedback @doug2mac !